Tuesday, February 06, 2007

Gene Expression data analysis

Paper: Graph-based iterative Group Analysis enhances microarray interpretation
Rainer Breitling,Anna Amtmann,Pawel Herzy
src: http://www.biomedcentral.com/1471-2105/5/100

Gene Expression data analysis :
[from micro array data of a partcular cell state (during a particular phase in the cell cycle/diseased (brain tumor)/healthy) which we want to analyse]
microarray data only gives the magnitude of cDNA's of different genes

One needs to derive the relationships [cause -effect relationships]
  • metabolic pathways
  • signallin pathways
  • protein interaction maps

Identifying subgraphs (that form a community? [PNAS art]subgraphs having high clustering coefficient )

bigraph - Graph with two types of nodes

Good information on the existing methods of Gene Expression Analysis [And gene expression data too]
http://smd.stanford.edu/cgi-bin/search/QuerySetup.pl

good presentation : http://genome-www5.stanford.edu/help/TUTORIALS/SMD_Analysis.ppt

Slide 6 : Factors for Measurement/Process errors
Data smoothened out using normalization : doping controls

Imp: Clustering algorithms
good example begins: slide 21

Heirarchical clustering/Self organising maps

gene Expression microarray data converted into n dimensional vector

x[i] = log(ratio)[i] i=1 to n

How close the two genes are is defined by the distance between those vectors
Metrics:
  1. Euclidiean - the distance between the two vectors in space v-u v and u are ntruple vectors
  2. Manhattan - http://mathworld.wolfram.com/TaxicabMetric.html
  3. Pearson correlation coefficient : It measures the tendancy of both the variables to increase and decrease together.<-- This is also a part of datamining
  • This can imply that the change in them is the effect of some other vector
  • or both the vectors influence each other [infinite loop till saturation?]
  • http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
4. Cosine Similarity? [this is used in document similarity] : It is the cosine of the angle between two vectors.

Cluster the expression data [patterns] based upon the similarity[ metrics given above]
Similar to Kruskal's Minimum spanning tree algorithm.

this can lead to erroneous results (why?)

SOM's (self organising maps)
<-- This is also a part of datamining
This is a single layer feed forward network connected to a grid of 2D/3D vectors.

Why dont I use edge betweenness based algorithm? [PNAS: community structure in social and biological networks]

Tools:
PAJEK
SMD - Stanford dataset

No comments: