TY - JOUR
T1 - Integrating omics datasets with the OmicsPLS package
AU - Bouhaddani, Said El
AU - Uh, Hae-Won
AU - Jongbloed, Geurt
AU - Hayward, Caroline
AU - Klarić, Lucija
AU - Kiełbasa, Szymon M
AU - Houwing-Duistermaat, Jeanine
N1 - Publisher Copyright:
© 2018 The Author(s).
PY - 2018/10/11
Y1 - 2018/10/11
N2 - BACKGROUND: With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS.RESULTS: We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data.CONCLUSIONS: We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages("OmicsPLS").
AB - BACKGROUND: With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS.RESULTS: We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data.CONCLUSIONS: We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages("OmicsPLS").
KW - Data-specific variation
KW - Joint principal components
KW - O2PLS
KW - Omics data integration
KW - R package
UR - http://www.scopus.com/inward/record.url?scp=85054611735&partnerID=8YFLogxK
U2 - 10.1186/s12859-018-2371-3
DO - 10.1186/s12859-018-2371-3
M3 - Article
C2 - 30309317
SN - 1471-2105
VL - 19
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 1
M1 - 371
ER -