Gibbs Sampling can be used in population genetics research to build populations.
Often times in population genetics research, it is useful to classify individuals in a sample into populations. While other areas of study may use factors like linguistic, cultural, or physical characters, it is not always easy to assign populations based on genotypes.
Pritchard et al. 2000 proposed a method to assign individuals to populations and simultaneously estimate allele frequencies: homozygous dominant, heterozygous, and homozygous recessive. This method can utilize various genetic markers as indication of alleles including SNPS, RFLPS, micro-satellites, etc. Additionally, they follow a few assumptions:
Assumes markers are unlinked loci → can be drawn as independent samples
Assumes Hardy Weinberg Equilibrium
Hardy Weinberg Equilibrium \[p^2+2pq+q^2 = 1\] Where \(p^2\) is the homozygous dominant allele frequency, \(2pq\) is the heterozygous allele frequency, and \(q^2\) is the homozygous recessive allele frequency. By assuming Hardy Weinberg Equilibrium, we are stating that the genetic variation in a population will remain constant from one generation to the next.
Under these assumptions, each allele at each locus in each genotype is an independent draw from the appropriate frequency distribution.
Assume each population is modeled by a characteristic set of allele frequencies.
Information about P and Z is given by the posterior distribution:
\[\begin{aligned}P(Z, P |X) &= \frac{P(Z, P, X)}{P(X)} \\ &\propto P(P) P(Z) P(X|Z, P) \end{aligned}\]
This is a great opportunity to use Gibbs Sampling! It is not possible to sample from the posterior, \(P(Z, P|X)\), directly. We don’t know what \(P(P)\) or \(P(Z)\) are because they are our unknown variables. However, it is possible to use conditional sampling to build an approximate distribution: \((Z_1,P_1), (Z_2, P_2), ..., (Z_n, P_n)\).
Start with randomly drawn initial value \(Z_0\) as a hypothetical population of origin and \(P_0\) as a hypothetical allele frequency. Then, iterate the following steps: