South Asian PCA

Posted by Zack on February 13, 2011

I used Eigensoft to create a PCA plot of the South Asians in our Reference I dataset (a total of 398 samples) along with the first batch of South Asian Harappa Project participants (HRP0001 to HRP0009).

The PCA software removed 2 Makranis, 1 Sindhi, 1 Balochi and 1 Brahui as outliers, thus leaving us with 402 samples to perform a PCA on.

Here are the plots for the first four eigenvectors. Click to see bigger images.

South Asian PCA eig1 vs eig2

South Asian PCA eig1 vs eig3

South Asian PCA eig2 vs eig3

South Asian PCA eig1 vs eig4

South Asian PCA eig2 vs eig4

South Asian PCA eig3 vs eig4

If you have seen the South Asian plot at 23andme, the first plot here isn't very different except that it seems rotated.

UPDATE: Eigenvectors 1 through 4 explain 1.12%, 0.77%, 0.71% and 0.44% of the total variance.

PCAharappa, south asia

← Reference II Admixture Analysis K=6-9

My Genetic Journey →

8 Comments.

RK February 13, 2011 at 1:38 pm

Interesting, eigenvector 4 seems to capture variation between one group of Gujaratis and everyone else. Could you tell us the size of the eigenvalues?

By the way, ran some new PCAs with the HapMap Gujaratis and without the Amerindian populations. (I also went back to using R. gnuplot is too hard.) Here are the full PCA and a zoom into the South Asian cluster. Pretty much what you'd expect, I think. First dimension captures about 3.3 times more variation than the second one.

I also did a couple of ADMIXTURE runs at K=10 and higher. (Spreadsheet and barplot.) There's some weird/interesting stuff going on at higher K's, like a splitting-off of a Kurdish component from the generalized Caucasian/Kurdish component (both of which are present in the South Asian populations), but I'm reluctant to put it up without examining the most likely possibility -- that I'm screwing something up.
RK February 13, 2011 at 1:39 pm

Interesting, eigenvector 4 seems to capture variation between one group of Gujaratis and everyone else. Could you tell us the size of the eigenvalues?

By the way, ran some new PCAs with the HapMap Gujaratis and without the Amerindian populations. (I also went back to using R. gnuplot is too hard.) Here are the full PCA and a zoom into the South Asian cluster. Pretty much what you'd expect, I think. First dimension captures about 3.3 times more variation than the second one.

I also did a couple of ADMIXTURE runs at K=10 and higher. (Spreadsheet and barplot.) There's some weird/interesting stuff going on at higher K's, like a splitting-off of a Kurdish component from the generalized Caucasian/Kurdish component (both of which are present in the South Asian populations), but I'm reluctant to put it up without examining the most likely possibility -- that I'm screwing something up.

(Sorry, reposting after closing HTML tags.)
D.I.Y. population structure inference, part 1 of many | Gene Expression | Discover Magazine - pingback on February 13, 2011 at 6:38 pm
Who are those Houston Gujus? | Gene Expression | Discover Magazine - pingback on February 14, 2011 at 6:38 pm
Who are those Houston Gujus? | Biology News by Biologged - pingback on February 14, 2011 at 8:31 pm
Simranjits February 15, 2011 at 10:03 am

Any idea how dr doug mcdonald generates this http://www.scs.illinois.edu/~mcdonald/PCA84pops.html
?
- Zack February 15, 2011 at 11:07 am
  
  It's an animated three-dimensional PCA plot.
  
  Take a look at the source code. It looks like simple Javascript.
Singapore Indians | Harappa Ancestry Project - pingback on March 11, 2011 at 10:14 am

Trackbacks and Pingbacks:

D.I.Y. population structure inference, part 1 of many | Gene Expression | Discover Magazine - Pingback on 2011/02/13/ 18:38
Who are those Houston Gujus? | Gene Expression | Discover Magazine - Pingback on 2011/02/14/ 18:38
Who are those Houston Gujus? | Biology News by Biologged - Pingback on 2011/02/14/ 20:31
Singapore Indians | Harappa Ancestry Project - Pingback on 2011/03/11/ 10:14

Harappa Ancestry Project

Genetics and South Asia

South Asian PCA

Related

8 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

South Asian PCA

Share this:

Related

8 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll