I used Eigensoft to create a PCA plot of the South Asians in our Reference I dataset (a total of 398 samples) along with the first batch of South Asian Harappa Project participants (HRP0001 to HRP0009).
The PCA software removed 2 Makranis, 1 Sindhi, 1 Balochi and 1 Brahui as outliers, thus leaving us with 402 samples to perform a PCA on.
Here are the plots for the first four eigenvectors. Click to see bigger images.
If you have seen the South Asian plot at 23andme, the first plot here isn't very different except that it seems rotated.
UPDATE: Eigenvectors 1 through 4 explain 1.12%, 0.77%, 0.71% and 0.44% of the total variance.
Interesting, eigenvector 4 seems to capture variation between one group of Gujaratis and everyone else. Could you tell us the size of the eigenvalues?
By the way, ran some new PCAs with the HapMap Gujaratis and without the Amerindian populations. (I also went back to using R. gnuplot is too hard.) Here are the full PCA and a zoom into the South Asian cluster. Pretty much what you'd expect, I think. First dimension captures about 3.3 times more variation than the second one.
I also did a couple of ADMIXTURE runs at K=10 and higher. (Spreadsheet and barplot.) There's some weird/interesting stuff going on at higher K's, like a splitting-off of a Kurdish component from the generalized Caucasian/Kurdish component (both of which are present in the South Asian populations), but I'm reluctant to put it up without examining the most likely possibility -- that I'm screwing something up.
Interesting, eigenvector 4 seems to capture variation between one group of Gujaratis and everyone else. Could you tell us the size of the eigenvalues?
By the way, ran some new PCAs with the HapMap Gujaratis and without the Amerindian populations. (I also went back to using R. gnuplot is too hard.) Here are the full PCA and a zoom into the South Asian cluster. Pretty much what you'd expect, I think. First dimension captures about 3.3 times more variation than the second one.
I also did a couple of ADMIXTURE runs at K=10 and higher. (Spreadsheet and barplot.) There's some weird/interesting stuff going on at higher K's, like a splitting-off of a Kurdish component from the generalized Caucasian/Kurdish component (both of which are present in the South Asian populations), but I'm reluctant to put it up without examining the most likely possibility -- that I'm screwing something up.
(Sorry, reposting after closing HTML tags.)
Any idea how dr doug mcdonald generates this http://www.scs.illinois.edu/~mcdonald/PCA84pops.html
?
It's an animated three-dimensional PCA plot.
Take a look at the source code. It looks like simple Javascript.