Rainer Breitling,Anna Amtmann,Pawel Herzy
src: http://www.biomedcentral.com/1471-2105/5/100
Gene Expression data analysis :
[from micro array data of a partcular cell state (during a particular phase in the cell cycle/diseased (brain tumor)/healthy) which we want to analyse]
microarray data only gives the magnitude of cDNA's of different genes
One needs to derive the relationships [cause -effect relationships]
- metabolic pathways
- signallin pathways
- protein interaction maps
Identifying subgraphs (that form a community? [PNAS art]subgraphs having high clustering coefficient )
bigraph - Graph with two types of nodes
Good information on the existing methods of Gene Expression Analysis [And gene expression data too]
http://smd.stanford.edu/cgi-bin/search/QuerySetup.pl
good presentation : http://genome-www5.stanford.edu/help/TUTORIALS/SMD_Analysis.ppt
Slide 6 : Factors for Measurement/Process errors
Data smoothened out using normalization : doping controls
Imp: Clustering algorithms
good example begins: slide 21
Heirarchical clustering/Self organising maps
gene Expression microarray data converted into n dimensional vector
x[i] = log(ratio)[i] i=1 to n
How close the two genes are is defined by the distance between those vectors
Metrics:
- Euclidiean - the distance between the two vectors in space v-u v and u are ntruple vectors
- Manhattan - http://mathworld.wolfram.com/TaxicabMetric.html
- Pearson correlation coefficient : It measures the tendancy of both the variables to increase and decrease together.<-- This is also a part of datamining
- This can imply that the change in them is the effect of some other vector
- or both the vectors influence each other [infinite loop till saturation?]
- http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
Cluster the expression data [patterns] based upon the similarity[ metrics given above]
Similar to Kruskal's Minimum spanning tree algorithm.
this can lead to erroneous results (why?)
SOM's (self organising maps) <-- This is also a part of datamining
This is a single layer feed forward network connected to a grid of 2D/3D vectors.
Why dont I use edge betweenness based algorithm? [PNAS: community structure in social and biological networks]
Tools:
PAJEK
SMD - Stanford dataset
No comments:
Post a Comment