The central limit theorem for the number of mutations in the genealogy of a sample from a large population

Guardado en:
Detalles Bibliográficos
Publicado en:bioRxiv (Jan 26, 2025)
Autor principal: Yun-Xin, Fu
Publicado:
Cold Spring Harbor Laboratory Press
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3159906389
003 UK-CbPIL
022 |a 2692-8205 
024 7 |a 10.1101/2025.01.23.634620  |2 doi 
035 |a 3159906389 
045 0 |b d20250126 
100 1 |a Yun-Xin, Fu 
245 1 |a The central limit theorem for the number of mutations in the genealogy of a sample from a large population 
260 |b Cold Spring Harbor Laboratory Press  |c Jan 26, 2025 
513 |a Working Paper 
520 3 |a The number K of mutations identifiable in a sample of n sequences from a large population is one of the most important summary statistics in population genetics and is ubiquitous in the analysis of DNA sequence data. K can be expressed as the sum of n-1 independent geometric random variables. Consequently, its probability generating function was established long ago, yielding its well-known expectation and variance. However, the statistical properties of K is much less understood than those of the number of distinct alleles in a sample. This paper demonstrates that the central limit theorem holds for K, implying that K follows approximately a normal distribution when a large sample is drawn from a population evolving according to the Wright-Fisher model with a constant effective size, or according to the constant-in-state model, which allows population sizes to vary independently but bounded uniformly across different states of the coalescent process. Additionally, the skewness and kurtosis of K are derived, confirming that K has asymptotically the same skewness and kurtosis as a normal distribution. Furthermore, skewness converges at speed $1/\sqrt{\ln(n)}$ and while kurtosis at speed $1/\ln(n)$. Despite the overall convergence speed to normality is relatively slow, the distribution of K for a modest sample size is already not too far from normality, therefore the asymptotic normality may be sufficient for certain applications when the sample size is large enough.Competing Interest StatementThe authors have declared no competing interest. 
653 |a Skewness 
653 |a Kurtosis 
653 |a Genealogy 
653 |a DNA sequencing 
653 |a Random variables 
653 |a Statistical analysis 
653 |a Nucleotide sequence 
653 |a Population genetics 
653 |a Mutation 
653 |a Normal distribution 
653 |a Central limit theorem 
653 |a Statistical models 
773 0 |t bioRxiv  |g (Jan 26, 2025) 
786 0 |d ProQuest  |t Biological Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3159906389/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3159906389/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://www.biorxiv.org/content/10.1101/2025.01.23.634620v1