Cerebral Workout: Gene Expression data analysis

Paper: Graph-based iterative Group Analysis enhances microarray interpretation
Rainer Breitling,Anna Amtmann,Pawel Herzy
src: http://www.biomedcentral.com/1471-2105/5/100

Gene Expression data analysis :
[from micro array data of a partcular cell state (during a particular phase in the cell cycle/diseased (brain tumor)/healthy) which we want to analyse]
microarray data only gives the magnitude of cDNA's of different genes

One needs to derive the relationships [cause -effect relationships]

metabolic pathways
signallin pathways
protein interaction maps

Identifying subgraphs (that form a community? [PNAS art]subgraphs having high clustering coefficient )

bigraph - Graph with two types of nodes

Good information on the existing methods of Gene Expression Analysis [And gene expression data too]
http://smd.stanford.edu/cgi-bin/search/QuerySetup.pl

good presentation : http://genome-www5.stanford.edu/help/TUTORIALS/SMD_Analysis.ppt

Slide 6 : Factors for Measurement/Process errors
Data smoothened out using normalization : doping controls

Imp: Clustering algorithms
good example begins: slide 21

Heirarchical clustering/Self organising maps

gene Expression microarray data converted into n dimensional vector

x[i] = log(ratio)[i] i=1 to n

How close the two genes are is defined by the distance between those vectors
Metrics:

Euclidiean - the distance between the two vectors in space v-u v and u are ntruple vectors
Manhattan - http://mathworld.wolfram.com/TaxicabMetric.html
Pearson correlation coefficient : It measures the tendancy of both the variables to increase and decrease together.<-- This is also a part of datamining

This can imply that the change in them is the effect of some other vector
or both the vectors influence each other [infinite loop till saturation?]
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

4. Cosine Similarity? [this is used in document similarity] : It is the cosine of the angle between two vectors.

Cluster the expression data [patterns] based upon the similarity[ metrics given above]
Similar to Kruskal's Minimum spanning tree algorithm.

this can lead to erroneous results (why?)

SOM's (self organising maps) <-- This is also a part of datamining
This is a single layer feed forward network connected to a grid of 2D/3D vectors.

Why dont I use edge betweenness based algorithm? [PNAS: community structure in social and biological networks]

Tools:
PAJEK
SMD - Stanford dataset

Cerebral Workout

Tuesday, February 06, 2007

Gene Expression data analysis

No comments:

Blog Archive

Cerebral Workout

Tuesday, February 06, 2007

Gene Expression data analysis

No comments:

Subscribe

Blog Archive