Continuing with admixture analysis of Reference II dataset, here's the spreadsheet.
Other than the differences with Reference I analysis, do take a look at the additional ethnic groups included in this dataset, especially the 8 South Asian groups: Tamil Nadu Dalit, Irula, Andhra Pradesh Madiga, Andhra Pradesh Mala, Tamil Nadu Brahmin, Andhra Pradesh Brahmin, Punjabi Arain, Nepali.
Let's start with K=6.
Note the difference between Tamil Nadu Dalits and Brahmins. The Dalits lack the European ancestral component of the Brahmins.
For K=7, the East Asian component splits into Northeast Asian and Southeast Asian.
Punjabi Arain are about the same as Sindhis (excluding the those with some African ancestry) in terms of their ancestral components.
Comparing the Andhra Brahmins to the Mala and Madiga, we see the same pattern as in Tamil Nadu: Brahmins have more European and Southwest/West Asian while Mala and Madiga have more Southeast Asian and South Asian.
At K=8, the African component splits into West African and East African.
The Nepalese samples are interesting. They have about 49% South Asian, 19% Northeast Asian, 16% European and 10% Southeast Asian. So they look like a mix of South Asian and East Asian.
Similar to the previous post, here's a comparison of K=8 admixture analysis between Reference I and Reference II datasets.
Here's the average absolute difference between the two datasets for each ancestral component:
Ancestral Component | Mean(Abs(Ref1-Ref2)) |
---|---|
South Asian (C1) | 2.17% |
Southwest Asian (C2) | 1.32% |
European (C3) | 1.70% |
Southeast Asian (C4) | 2.16% |
Papuan (C5) | 0.33% |
Northeast Asian (C6) | 1.93% |
West African (C7) | 0.27% |
East African (C8) | 0.48% |
The larger differences are for Balochi, Cambodian, Dai, Han, Kalash, Lahu, Miao, Naxi, She, Singapore Chinese, Tu, Tujia, US Chinese, and Yi, Thus, it's mostly East Asian groups.
For K=9, we see some divergence between the ancestral components inferred from Reference II as compared to Reference I. Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples.
Here's a summary of the ancestral components inferred from Reference II dataset:
K=2 | K=3 | K=4 | K=5 | K=6 | K=7 | K=8 | K=9 |
---|---|---|---|---|---|---|---|
Eurasian | European | S Asian | S Asian | S Asian | S Asian | S Asian | S Asian |
African | E Asian | European | European | European | European | SW Asian | European |
African | E Asian | E Asian | E Asian | SE Asian | European | SW Asian | |
African | SW Asian | SW Asian | SW Asian | SE Asian | SE Asian | ||
African | Papuan | Papuan | Papuan | Papuan | |||
African | NE Asian | NE Asian | NE Asian | ||||
African | W African | Polynesian | |||||
E African | W African | ||||||
E African |
I might do some admixture runs for Reference II with Harappa participants later.
"Instead of the Kalash component in Reference I analysis, we get the Polynesian component here. This is likely due to the inclusion of Tongan and Samoan samples."
Interesting. I understand that it might take seom time, but if you run it without those latter populations, might we get a component more like the Kalash one?
It is possible we might get the Kalash component in reference II at a higher K. Let's see.
Yes, that seems reasonable. As always, I am looking forward to your future posts!
Almost all the Chinese are now around 50% SE Asian, didn't see this before is it right.
The Northeast Asian component is modal among the Japanese. Thus the Chinese probably should be somewhat mixed with what I have termed Southeast Asian. However, I am going to look at the individual samples of the Chinese to see if there's variation between individuals.
some of the xing results are weird for brownz IMO. not sure i trust it totally.
I've been playing around a little with the Xing dataset. Here's a PCA, minus the African populations. (Maybe I should've removed the Amerindian ones as well.) The lousy labeling is due to my not really knowing how to use gnuplot.
Here are the South Asian populations. Note that since I included myself in this second run, there were only 40,808 SNPs after pruning -- though it doesn't look like anyone's shifted that much as a result. (As Zack noted earlier, the Xing dataset doesn't have that many SNPs in common with 23andMe's chip.) With that caveat in mind, it looks like some AP Brahmins are shifted towards the tribal/Dalit cluster. I'm the red cross with a blue box, by the way.
I tried merging in the HapMap Gujaratis, but something went wrong and they ended up clustering far away from everyone else. (And defining their own component to boot.) Maybe I forgot to extract the common SNPs -- in which case they're going in tomorrow!
Great! I just did some PCA plots too. Expect something to be up on the blog by morning.
Nice work!