BayMeth: Improved DNA methylation estimation for affinity capture sequencing data using a flexible Bayesian approach
Background: DNA methylation (DNAme) is a vital component of the epigenetic regulatory machinery and aberrations occur in many diseases, such as cancer and diabetes. In light of recent demethylating therapeutic agents, mapping and understanding DNAme profiles offers considerable promise for reversing the aberrant states. There are several approaches to analyze DNAme, which vary widely in cost, resolution and coverage. Affinity capture methods for DNAme (e.g. sequencing of methyl-binding domain captured regions, or methyl binding domain (MBD)-seq) strike a good balance between the high cost of whole genome bisulphite sequencing (WGBS) and the low coverage of methylation arrays. However, existing statistical methods to analyze these data are unable to differentiate between hypomethylation patterns and low capture efficiency, do not offer flexibility to correct for copy number variation (CNV), do not produce practical precision estimates and can suffer from long running times. Results: We propose an empirical Bayes framework that uses a fully methylated (i.e. SssI treated) control sample to transform observed read densities into regional methylation estimates. In our model, inefficient ?1 capture can readily be distinguished from low methylation levels, by means of larger posterior variances. Furthermore, we can integrate CNV by introducing an multiplicative offset into our Poisson model framework. Notably, our model offers analytic expressions for the mean and variance of the methylation level and thus is fast to compute. Our algorithm performs better in terms of bias, mean-squared error and coverage probabilities compared to existing approaches when applied to an human lung fibroblast (IMR-90) MBD-seq test dataset, where ?true? methylation levels are available from WGBS. Directly integrating CNV improves estimation performance in a prostate cancer cell MBD-seq dataset. Conclusions: Our model not only improves on existing methods, but flexibly allows explicit modeling of CNV, allows context-specific prior information and affords a computationally-efficient analytic estimator. Our method can be applied to methylated DNA affinity enrichment assays (e.g MBD-seq, MeDIP-seq) and a software implementation will be freely available in the Bioconductor Repitools package.
|Authors||Riebler, A.; Song, J.Z.; Statham, A.L.; Stirzaker, C.; Menigatti, M.; Mahmud, N.; Mein, C.A.; Marra, G.; Clark, S.J.; Robinson, M.D.|
|Publisher Name||GENOME BIOL|
|Published Date||2014-12-01 00:00:00|
|URL link to publisher's version||http://www.ncbi.nlm.nih.gov/pubmed/24517713|
|OpenAccess link to author's accepted manuscript version||https://publications.gimr.garvan.org.au/open-access/11469|