ChromoPainter/fineStructure South Asians

Posted by Zack on January 31, 2012

You have probably heard of ChromoPainter/fineSTRUCTURE by now (Eurogenes, Dienekes, MDLP and Razib).

So I decided to run the South Asian samples data which I had earlier done PCA/MClust on through ChromoPainter and fineSTRUCTURE.

Here is the coancestry matrix among the 715 participants visualized as a heat map.

UPDATE: Here's a huge image showing the same.

fineSTRUCTURE can use this coancestry matrix to classify individuals into clusters, 52 in this case (compared to 38 using PCA and MClust). You can check the cluster assignments in a spreadsheet.

Note that I have named the clusters. That's just a shorthand so we don't have to refer to them by cluster number. Instead I used the population with the largest number of individuals in a cluster to label that cluster.

Here's the cluster-level coancestry heat map.

And the pairwise coincidence:

And finally PCA plots for the first 10 dimensions from fineSTRUCTURE.

UPDATE (Feb 9, 2012): New PCA plots with better markers for the clusters.

ChromoPainter, Clusters, PCAchromopainter, finestructure, paintmychromosomes, south asia

← Project Anniversary

Relatives in Datasets →

23 Comments.

Dienekes January 31, 2012 at 12:23 pm

What was the overall running time and how is it broken down into components (phasing/chromopainter/finestructure)?
- Zack January 31, 2012 at 12:45 pm
  
  6 hours for phasing.
  22 hrs for ChromoPainter.
  48-72 hrs for fineSTRUCTURE since that can't be parallelized.
  - Dienekes Pontikos January 31, 2012 at 1:56 pm
    
    That doesn't seem too bad. How many SNPs/threads and which software did you use for phasing? I have phased a dataset of similar size with 3 threads using shapeIT and that took days, so I am wondering whether I should try something else.
    - Zack January 31, 2012 at 6:26 pm
      
      90,000 SNPs and 8 threads using BEAGLE.
  - Vitasta February 1, 2012 at 2:29 am
    
    I am curious what sort of processing "oomph" you have for compute-intensive stuff like this. Performance is always relative to such things like number of processors, processor speed, cache, memory, etc., no.
    - Zack February 1, 2012 at 6:41 am
      
      I recently upgraded my computer.
SB January 31, 2012 at 1:41 pm

Great! Thanks Zack!
Can you put up a better resolved figure 1? or provide a link to download the figure? the axes are unreadable on the png format.
- Zack January 31, 2012 at 7:04 pm
  
  Done.
ChromoPainter & fineSTRUCTURE on a South Asian data set | Gene Expression | Discover Magazine - pingback on February 1, 2012 at 5:59 am
ChromoPainter & fineSTRUCTURE on a South Asian data set | Biology News by Biologged - pingback on February 1, 2012 at 8:32 am
pconroy February 1, 2012 at 11:30 am

Zach,

Can I ask you for more specifics on your hardware, is it for instance:
1. Quad-processor or above?
2. How much RAM?
3. What OS, Windows 7 or some variant of Linux, like Ubuntu or something?
4. How is storage organized, RAID array, SATA or whatever?

Any other optimization tweaks?

Thanks in advance
- SB February 1, 2012 at 1:03 pm
  
  He is running a Core i7 quadcore with 8GB RAM, and Ubuntu/Win XP
  http://www.zackvision.com/weblog/2011/11/computer-upgrade/
  - Zack February 8, 2012 at 9:35 am
    
    I have finally ditched XP for Windows 7. But all the Harappa Project work is done in Ubuntu.
pconroy February 1, 2012 at 12:22 pm

Zach,

BTW, Doug McDonald finds that I have 3.1% Pathan or 3% Sindi ancestry, while Dienekes has found that my Father and Mother have 10.3 and 9.3 of his Gedrosia component.

Am I eligible to join the Harappan Project as a result??
- Zack February 8, 2012 at 9:33 am
  
  Well eligibility is in the eye of the beholder. 🙂
  
  I don't usually refuse potential participants, but at the same time a number of my analyses, other than basic admixture, do not include many non-South-Asians.
  
  So you have to figure out if you will get anything useful from submitting to the Harappa Ancestry Project.
  - pconroy February 9, 2012 at 6:18 pm
    
    Well 23andMe tells me that 2 of my Relatives are Indian - both 5th cousins, 4-10 range - one called Bennett, one called Thakrar - who now live in NZ and Kenya respectively.
    
    So it may be that I actually have some recent South Asian ancestry?! When I search for my name Conroy in an online database of British Army stationed in India, I see that there were 38 enlisted men and 2 officers called Conroy in Bengal alone. Of course Bengal is not Pakistan, but the Connaught Rangers - an all Irish battalion of the British Army - battled the Pathans and others in the region. I'm wondering if one of them took a "War Bride" back to Ireland??
    
    http://en.wikipedia.org/wiki/Connaught_Rangers
- AV February 9, 2012 at 9:03 pm
  
  Pcontroy, if you're referring to Dr. McDonald's analysis as far as your South Asian admixture is concerned, don't take it too seriously. McDonald uses what may be deemed "mixed" samples. For instance, you scored around 3.1% and 3% with the Pakistani Sindhi and Pakistani Pashtun respectively. It is very likely that these small percentages are popping up due to shared ancient ancestry with the said groups; as opposed to having any real South-Asian admixture. The Pathan and the Sindhi both have appreciable levels of North(-east) European admixture. It seems unlikely to me that you'd have any real, non-trivial and recent South-Asian ancestry. We could say the same for Gedrosia - it seems to be found in non-trace levels in most West-Eurasian populations and is probably simply a signature of generic West-Eurasian ancestry as opposed to anything real.
SB February 1, 2012 at 12:59 pm

Thanks!
I guess the coancestry plot doesn't say much. The finestructure PCA plots are easier to read. It looks like the plots are symmetric with respect to the transpose. Is there a way to figure out which populations are donors, and which are the acceptors? for example, from the PCA plot, the Vysya group(on vertical axis on the left) has a blue line corresponding to kanjar, singapore 3 and dharkar, while this is transposed also(if you look at Vysya on the horizontal axis on top). So does this mean that the genes flowed both ways?
- Zack February 8, 2012 at 9:29 am
  
  ChromoPainter can be run two ways. One is to define specific populations as donors and compute the results for everyone based on those donors.
  
  The other is an all-against-all mode. Here you assume that for an individual all other samples are donors. This is what I did in this analysis. So you cannot find out direction of gene flow but you can make inferences about clustering and haplotype similarity etc.
  - SB February 10, 2012 at 8:14 am
    
    Thanks Zack.
The Kalash in perspective | Gene Expression | Discover Magazine - pingback on February 16, 2012 at 1:51 am
The Kalash in perspective | Biology News by Biologged - pingback on February 16, 2012 at 2:32 am
Dense South Asian ChromoPainter | Harappa Ancestry Project - pingback on February 16, 2012 at 7:04 am

Trackbacks and Pingbacks:

ChromoPainter & fineSTRUCTURE on a South Asian data set | Gene Expression | Discover Magazine - Pingback on 2012/02/01/ 05:59
ChromoPainter & fineSTRUCTURE on a South Asian data set | Biology News by Biologged - Pingback on 2012/02/01/ 08:32
The Kalash in perspective | Gene Expression | Discover Magazine - Pingback on 2012/02/16/ 01:51
The Kalash in perspective | Biology News by Biologged - Pingback on 2012/02/16/ 02:32
Dense South Asian ChromoPainter | Harappa Ancestry Project - Pingback on 2012/02/16/ 07:04

Harappa Ancestry Project

Genetics and South Asia