Application of the PageRank algorithm to evaluate microarrays

In September 2005, Julie L. Morrison et al. published an article in the journal BMC Bioinformatics called “GeneRank: Using search engine technology for the analysis of microarray experiments.” This article can be found here: http://www.biomedcentral.com/1471-2105/6/233

Every cell in an organism has the same genes, but different genes are “turned on” and “turned off” in different cells. Cells in the kidney, for example, express different genes than cells in the brain because they serve different functions. A microarray is an experimental technique where scientists can see the expression levels of thousands of genes in a group of cells. Scientists can then compare the microarrays to see the relationship between the cell’s function and gene expression. Microarrays may be compared between different organs, different times during embryonic development, cells treated with drugs versus cells not treated with drugs, cancerous versus non-cancerous cells, etc.

Because of the huge sets of data involved, one challenge of microarray analysis is to prioritize the most important genes to analyze. Morrison et al. applied Google’s PageRank idea to the analysis of microarrays. In class, we discussed how PageRank ranks webpages. Similarly, GeneRank ranks the genes analyzed in the microarray. Analogous to the webpages that appear at the top of a search query that the web user will look first, the goal of GeneRank is to give a clear answer for what genes scientists should investigate first.

The structure of GeneRank closely mirrors the structure of Google’s PageRank. As we discussed in class, the basic idea behind PageRank is that webpages are highly ranked if there are links to it from other highly ranked webpages. Similarly in GeneRank, a gene is highly ranked if it is associated with other genes with high ranks. While a webpage is a node in PageRank, each gene is a node in GeneRank. In PageRank, hyperlinks represent edges. In GeneRank, edges are represented by previously known biological knowledge about the association between the two genes connected by the edge. Thus, GeneRank draws attention to the structural network connecting different genes.

Traditional analysis of microarrays depends on the expression fold changes; a gene is considered important if there is a large difference between its expression level in the control group versus the experimental group. With GeneRank, the expression data is combined with connectivity data—how related the gene is with other genes, and especially with other highly-ranked genes. The relative contributions of expression and connectivity data to a gene’s priority score can be altered in the GeneRank algorithm. The authors suggest that GeneRank should be used alongside the traditional way of analyzing microarrays.

Posted in Topics: Education

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.