A New South Asian Project

Life and work have prevented me working on the Harappa Ancestry Project for a long time. Sadly, it looks like I won’t be able to get back to it in the near future.

My friend Razib has recently started his own South Asian Genotype Project. I’m curious to see what new results he comes up with.

Submission On Hold

Currently, Harappa Ancestry Project is on hold as I am busy with other things. Therefore, please do not send me your data.

However, if you belong to a South Asian ethnic group that is significantly different than any of the groups for whom I have data, please send me an email.

Catching Up on Emails

I have been busy and haven't even looked at this project in months. It's time now to dust the cobwebs.

Since I have not even read the emails I have been getting, that's where I am going to start.

If you sent me an email this summer, I'll try to reply to you in the next few days. If you don't hear from me by October 1, drop me a line.

Update: I just realized I have emails from Spring to answer as well.

HarappaWorld HRP0385-HRP0419

I have been working on a new admixture calculator whenever I have found some time from real life pursuits. However, that's still not ready and I have a lot of submissions. So I am posting the HarappaWorld results for them.

I have added the HarappaWorld Admixture results for HRP0385-HRP0419 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have not yet now updated the group averages.

I got the first submissions of 23andme v4 data. There's about a 12-13% missing SNP rate of v4 with HarappaWorld. So I don't expect major noise problems, though noise will be higher than for 23andme v3 data.

AncestryDNA

In my last Admixture run, there was a participant with results from Ancestry.com DNA service.

Most of the SNPs genotyped by ancestryDNA are in common with 23andme and FTDNA. Thus, there was a perfect overlap between ancestryDNA and HarappaWorld admixture.

So, I am now accepting participants with Ancestry.com DNA results too.

No Comments

I am sick and tired of the weird comments posted by various people as well as those who feed the trolls.

I do not have the time or energy to actively monitor and moderate comments.

Therefore, there will be no comments allowed here any more.

Afghan Dataset

A paper, Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge by Julie Di Cristofaro, Erwan Pennarun, Stéphane Mazières, Natalie M. Myres, Alice A. Lin, Shah Aga Temori, Mait Metspalu, Ene Metspalu, Michael Witzel, Roy J. King, Peter A. Underhill, Richard Villems, Jacques Chiaroni was published at PLoS One about the genetics of the people of Afghanistan.

Thanks to Mait Metspalu, the data is available online. It consists of:

  • 5 Hazara
  • 5 Pashtun
  • 5 Tajik
  • 4 Turkmen
  • 5 Uzbek

Here are the HarappaWorld Admixture results for the samples in this dataset.

You can check the spreadsheet too.

Tadjik1_44Af and Pashtun2_6Af seem to be outliers and there's a possibility they are mislabeled. I would like to look into these two samples further before I calculate group averages.

You can compare these Pashtun results to HGDP Pathan and HAP Pashtun results.

Webhost Move

I have just moved this website over to a new domain registrar and webhost.

Hopefully, DNS has propagated and you are seeing this new location. To avoid confusion, comments on the old install are disabled.

Let me know if you run into any issues/problems. Also, is the blog faster or slower than before?

UPDATE: I had to re-import the database, so some recent comments were lost. Sorry!

23andme and FDA

FDA had asked 23andme to stop its direct-to-consumer genetic testing and as a result 23andme has issued the following statement:

After discussion with officials from the Food and Drug Administration today, 23andMe will comply with the FDA's directive and stop offering new consumers access to health-related genetic tests while the company moves forward with the agency's regulatory review processes.

Customers who purchased kits on or after the FDA's warning letter of November 22nd will not have access to health-related results. Those customers will have access to ancestry-related genetic information and their raw data without 23andMe's interpretation of that data. They may receive health-related results in the future, depending on FDA marketing authorization.

Customers who purchased kits before November 22, 2013 will continue to have access to all the reports they've always had.

While I am disappointed at this turn of events, for our project it does not change much since 23andme will still provide raw data downloads as well as ancestry information.

HarappaWorld HRP0375-HRP0384

I have added the HarappaWorld Admixture results for HRP0375-HRP0384 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages.