Using the reference II dataset of 548 South Asians and 38 Harappa Project South Asians that I have been working on, I ran Admixture.
The optimum number of ancestral components was 5-6. So I used K=6. The components are highest among the following groups:
C1 | Brahui, Makrani, Balochi | C2 | TN Dalit, North Kannadi |
---|---|---|---|
C3 | Irula | C4 | Gujaratis |
C5 | Hazara | C6 | Kalash |
I consider the Irulas, a Scheduled tribe from Tamil Nadu, to be problematic in a similar way to the Kalash except that the Irulas are well-scattered in their own space in the PCA plot.
Also, note that all the European, West Asian, etc is being represented by C1. Similarly, all the East Asian ancestry is being collected in C5.
The spreadsheet showing the admixture results is here. The first sheet shows the individual results for the project participants.
The 2nd sheet shows the average (and standard deviation) for the reference populations.
The 3rd sheet shows the average and standard deviation for each cluster computed by MClust from PCA.
The 4th sheet shows the average and standard deviation for each cluster computed by MClust from MDS.
Also, take a look at the admixture percentage standard deviations. You'll notice that those are generally lower for the clusters compared to the population groups.
Recent Comments