TY - GEN
T1 - Integrating protein family sequence similarities with gene expression to find signature gene networks in breast cancer metastasis
AU - Babaei, Sepideh
AU - Van Den Akker, Erik
AU - De Ridder, Jeroen
AU - Reinders, Marcel J T
PY - 2011
Y1 - 2011
N2 - Finding robust marker genes is one of the key challenges in breast cancer research. Significant signatures identified in independent datasets often show little to no overlap, possibly due to small sample size, noise in gene expression measurements, and heterogeneity across patients. To find more robust markers, several studies analyzed the gene expression data by grouping functionally related genes using pathways or protein interaction data. Here we pursue a protein similarity measure based on Pfam protein family information to aid the identification of robust subnetworks for prediction of metastasis. The proposed protein-to-protein similarities are derived from a protein-to-family network using family HMM profiles. The gene expression data is overlaid with the obtained protein-protein sequence similarity network on six breast cancer datasets. The results indicate that the captured protein similarities represent interesting predictive capacity that aids interpretation of the resulting signatures and improves robustness.
AB - Finding robust marker genes is one of the key challenges in breast cancer research. Significant signatures identified in independent datasets often show little to no overlap, possibly due to small sample size, noise in gene expression measurements, and heterogeneity across patients. To find more robust markers, several studies analyzed the gene expression data by grouping functionally related genes using pathways or protein interaction data. Here we pursue a protein similarity measure based on Pfam protein family information to aid the identification of robust subnetworks for prediction of metastasis. The proposed protein-to-protein similarities are derived from a protein-to-family network using family HMM profiles. The gene expression data is overlaid with the obtained protein-protein sequence similarity network on six breast cancer datasets. The results indicate that the captured protein similarities represent interesting predictive capacity that aids interpretation of the resulting signatures and improves robustness.
KW - breast cancer markers
KW - concordant signature
KW - protein-to-family distance matrix
KW - protein-to-protein sequence similarity
UR - http://www.scopus.com/inward/record.url?scp=80455162613&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-24855-9_22
DO - 10.1007/978-3-642-24855-9_22
M3 - Conference contribution
AN - SCOPUS:80455162613
SN - 9783642248542
VL - 7036 LNBI
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 247
EP - 259
BT - Pattern Recognition in Bioinformatics - 6th IAPR International Conference, PRIB 2011, Proceedings
T2 - 6th IAPR International Conference on Pattern Recognition in Bioinformatics, PRIB 2011
Y2 - 2 November 2011 through 4 November 2011
ER -