crowley-understandingsequencing-2018.pdf (221.01 kB)
Understanding sequencing data as compositions: an outlook and review
journal contribution
posted on 2018-08-15, 00:00 authored by Thomas Quinn, Ionas Erb, Mark RichardsonMark Richardson, Tamsyn CrowleyTamsyn CrowleyMotivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information: Supplementary data are available at Bioinformatics online.
History
Journal
BioinformaticsVolume
34Issue
16Pagination
2870 - 2878Publisher
Oxford AcademicLocation
London, Eng.Publisher DOI
ISSN
1367-4803eISSN
1367-4811Language
engPublication classification
C1 Refereed article in a scholarly journalUsage metrics
Categories
No categories selectedKeywords
Science & TechnologyLife Sciences & BiomedicineTechnologyPhysical SciencesBiochemical Research MethodsBiotechnology & Applied MicrobiologyComputer Science, Interdisciplinary ApplicationsMathematical & Computational BiologyStatistics & ProbabilityBiochemistry & Molecular BiologyComputer ScienceMathematicsRNA-SEQTRANSFORMATIONSPACKAGEGROWTH
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC