TY - GEN
T1 - Batch Correction of Genomic Data in Chronic Fatigue Syndrome Using CMA-ES
AU - Rincon, Alejandro Lopez
AU - Kraneveld, Aletta D.
AU - Tonda, Alberto
N1 - Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/7/8
Y1 - 2020/7/8
N2 - Modern genomic sequencing machines can measure thousands of probes from different specimens. Nevertheless, theoretically comparable datasets can show considerably distinguishable properties, depending on both platform and specimen, a phenomenon known as batch effect. Batch correction is the technique aiming at removing this effect from the data. A possible approach to batch correction is to find a transformation function between different datasets, but optimizing the weights of such a function is not trivial: As there is no explicit gradient to follow, traditional optimization techniques would fail. In this work, we propose to use a state-of-the-art evolutionary algorithm, Covariance Matrix Adaptation Evolution Strategy, to optimize the weights of a transformation function for batch correction. The fitness function is driven by the classification accuracy of an ensemble of algorithms on the transformed data. The case study selected to test the proposed approach is mRNA gene expression data of Chronic Fatigue Syndrome, a disease for which there is currently no established diagnostic test. The transformation function obtained from three datasets, produced from different specimens, remarkably improves the performance of classifiers on the task of diagnosing Chronic Fatigue. The presented results are an important steppingstone towards a reliable diagnostic test for this syndrome.
AB - Modern genomic sequencing machines can measure thousands of probes from different specimens. Nevertheless, theoretically comparable datasets can show considerably distinguishable properties, depending on both platform and specimen, a phenomenon known as batch effect. Batch correction is the technique aiming at removing this effect from the data. A possible approach to batch correction is to find a transformation function between different datasets, but optimizing the weights of such a function is not trivial: As there is no explicit gradient to follow, traditional optimization techniques would fail. In this work, we propose to use a state-of-the-art evolutionary algorithm, Covariance Matrix Adaptation Evolution Strategy, to optimize the weights of a transformation function for batch correction. The fitness function is driven by the classification accuracy of an ensemble of algorithms on the transformed data. The case study selected to test the proposed approach is mRNA gene expression data of Chronic Fatigue Syndrome, a disease for which there is currently no established diagnostic test. The transformation function obtained from three datasets, produced from different specimens, remarkably improves the performance of classifiers on the task of diagnosing Chronic Fatigue. The presented results are an important steppingstone towards a reliable diagnostic test for this syndrome.
UR - http://www.scopus.com/inward/record.url?scp=85089754864&partnerID=8YFLogxK
U2 - 10.1145/3377929.3389947
DO - 10.1145/3377929.3389947
M3 - Conference contribution
SN - 9781450371278
T3 - GECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
SP - 277
EP - 278
BT - GECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
CY - New York, NY, USA
ER -