File(s) under permanent embargo
A hybrid supervised approach to human population identification using genomics data
journal contribution
posted on 2021-03-01, 00:00 authored by Sahar Araghi, Thanh Thi NguyenThanh Thi NguyenSingle nucleotide polymorphisms (SNPs) are one type of genetic variations and each SNP represents a difference in a single DNA building block, namely a nucleotide. Previous research demonstrated that SNPs can be used to identify the correct source population of an individual. In addition, variations in the DNA sequences have an influence on human diseases. In this regard, SNPs studies are helpful for personalised medicine and treatment. In the literature, unsupervised clustering methods especially principal component analysis (PCA) have been popular for studying population structure. In this study, we investigate supervised approaches, particularly the LASSO multinomial regression classification method, for recognizing individuals' origin genetic population. Then, we introduce PCA-LASSO as an extension of LASSO method that benefits from advantageous characteristics of both PCA and LASSO regression. The experimental results obtained on the 1000 genome project dataset show PCA-LASSO's significantly high accuracy in prediction of individual's origin population.
History
Journal
IEEE/ACM transactions on computational biology and bioinformaticsVolume
18Issue
2Season
March-AprilPagination
443 - 454Publisher
Institute of Electrical and Electronics EngineersLocation
Piscataway, N.J.Publisher DOI
ISSN
1545-5963eISSN
2374-0043Language
engPublication classification
C1 Refereed article in a scholarly journalCopyright notice
2019, IEEEUsage metrics
Categories
No categories selectedKeywords
Population StructureMultinomial ClassificationPCALASSOPersonalised TreatmentScience & TechnologyLife Sciences & BiomedicineTechnologyPhysical SciencesBiochemical Research MethodsComputer Science, Interdisciplinary ApplicationsMathematics, Interdisciplinary ApplicationsStatistics & ProbabilityBiochemistry & Molecular BiologyComputer ScienceMathematicsSociologyPrincipal component analysisLogisticsDNAGenomics
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC