Our Reference II Dataset has 3,161 samples with 544 South Asians belonging to 24 ethnic groups. Unfortunately, we can do our admixture analysis on about 23,000 SNPs.
The ancestral population averages for each ethnic group from the admixture analysis can be seen in this spreadsheet. I have also calculated the standard deviation of the ancestral components for the samples in each ethnic group.
Here are the results for K=2.
For K=3, we get the ancestral populations: European, E Asian, African.
For K=4, the ancestral populations are South Asian, European, East Asian and African.
Let's compare the results of K=4 admixture analysis of Reference I and Reference II datasets.
While there is some difference in the average percentages of ancestral components computed with the two reference datasets, most of the differences are 1% or less. The mean absolute difference for the four components is as follows:
Ancestral Component | Mean(Abs(Ref1-Ref2)) |
---|---|
South Asian (C1) | 0.92% |
European (C2) | 0.58% |
East Asian (C3) | 0.52% |
African (C4) | 0.32% |
I have highlighted the larger differences which affect: Balochi, Kalash, Malayan, Melanesian, Papuan, and Samaritians. Even then the largest change is about 5%.
Let's also look at the Fst divergences. Here's for Reference I admixture results:
C1 | C2 | C3 | |
---|---|---|---|
C2 | 0.071 | ||
C3 | 0.083 | 0.109 | |
C4 | 0.152 | 0.152 | 0.184 |
And for Reference II:
C1 | C2 | C3 | |
---|---|---|---|
C2 | 0.074 | ||
C3 | 0.086 | 0.118 | |
C4 | 0.156 | 0.159 | 0.194 |
The Fst numbers for Reference II are somewhat higher.
Considering that Reference II has only one-eighth of the SNPs of Reference I, the results are fairly good.
Here's K=5 admixture analysis for Reference II:
Higher K values to follow.
1 Comments.