The HUGO Pan-Asian dataset covers South and East Asia with the following South Asian populations:
- 23 Andhra Pradesh & Karnataka
- 10 Bengali
- 23 Bhil (Rajasthan)
- 20 Haryana
- 23 Kashmir Spiti
- 12 Marathi
- 12 Rajasthani
- 30 Singapore Indian
- 20 Uttaranchal
- 13 Uttar Pradesh
Unfortunately, they do not specify ethnic or caste background for most Indian groups. Instead, their focus is on Mongoloid/Caucasoid/Australoid etc.
Also, the SNP overlap with other datasets is really small. Therefore, this reference 3 admixture run was done using only 5,400 SNPs. I recommend a big bucket of salt when interpreting these results.
Here is the spreadsheet with the Pan-Asian group averages for reference 3 admixture at K=11 ancestral components.
In this dataset, the Bhil have the highest South Asian + Ongee percentage (90%) followed by Haryana at 82%.
This seems to show that the Austronesian speakers have two common identities, Papuan and East Asian. Siberian traces could either be noise or from a more ancient common marker.
The Uyghur's also have high concentrations of South Asian, makes me wonder if their concentrations would have indicated a more robust migration route spanning the region in ancient times?
PCA does better with small numbers of SNPs.
Definitely. I tried this exercise just to have a comparison of the Pan-Asian data with the rest.
"I recommend a big bucket of salt when interpreting these results."
Agreed. The rather low no. of SNPs must indeed account these odd results.
- The Mon-Khmer speakers seem to exhibit an excess of the Onge component for the results to be plausible. In fact, Onge is centered around them!
- Most South-Asian populations, are, in their admix proportions, more South Asian than normal (in terms of Ref3 K=11) and the results for Haryana and Uttarkhand (Uttaranchal) relative to the other broadly-labeled Indian groups is out of psyche with basic geography and the clear cut geographic clines we see in South-Asia.
I agree there's definitely miscalculation and mistake in these odd results. It's basically contradicting all the other results we've seen. Almost all the study shows an percentage of Onge, which is not seen in all the other results.
For example in the other result Mlabri were shown to be 99% East Asian 1% Siberian, but in this result it shows them as 34% Onge????
WHAT THE HELL? these are definitely errors and inaccurate result
This Korean study shows 43% East Asian + 57% Siberian but in your spreadsheet study it shows Koreans are 18% Onge??? and even in all studies it shows Koreans don't have no Onge admixture. This includes almost all the other Southeast Asian result which didn't show Onge admixture or very low percetages.
Also the Negrito studies on here is very strange, it's dominated by East Asian genes while on many other study predominately by Negrito. For example the aeta (or ayta) study on the " Mapping Human Genetic Diversity in Asia" http://humpopgenfudan.cn/p/A/A1.pdf shows them to be at least 65% Negrito + 5% Papuan + 15% South Asia + 12% East Asian with no african DNA admixture. Yet this Ayta (or Aeta) study shows them be be 41% East Asian + 3% Siberian + 13% South Asian + 13% Onge + 22% Papuan, 3% San pygmy, 1% East African.
As I said above:
Also, the SNP overlap with other datasets is really small. Therefore, this reference 3 admixture run was done using only 5,400 SNPs. I recommend a big bucket of salt when interpreting these results.