Author Archives: Zack - Page 14

Reference 3 Population Concordance

Posted by Zack on May 8, 2011 Comments Off

Dienekes had come up with a population concordance ratio which compared the IBS similarity percentages of a trio of individuals to compute the probability that two individuals from population A are more similar to each other than either is to any individual in population B.

Please note that

If two populations can be perfectly distinguished, then their population concordance ratio is 1. If however we randomly divide a set of individuals into two populations and try to calculate the population concordance ratio, we'll find it to be 0.25. It is possible for this ratio to be as low as zero.

If the concordance ratio between two populations is low, that does not necessarily mean that they are very similar. It's possible that a population does not form a tight cluster and has a lot of variation and thus is not distinguishable from another.

Now, here's the spreadsheet for the concordance ratios. You can focus on the South Asian population pairs here.

West, Central, South & Southeast Asian Admixture

Posted by Zack on May 7, 2011 21 comments

Another set of admixture runs. This one uses the South Asian, Middle Eastern, Caucasian, Central Asian, Southeast Asian and Oceanian samples from Reference 3.

Basically I consider these to be our target populations. The idea is to build out from here by adding a few samples from other populations to make the results better.

Right now, the absence of African, European, East Asian and Siberian populations makes some of the other populations substitute for them. For example, Siddi works as African substitute while Aonaga works as East Asian substitute.

Here are the admixture results. You can choose the number of ancestral components, K, from the dropdown below.

I find K=11 and K=14 to be the most interesting. They have the two lowest cross-validation errors too.

Reference I Admixture Errors

Posted by Zack on May 6, 2011 Comments Off

I am have thinking about error estimation for Admixture results for some time since I have heard a lot of arguments about how even 0.1% result is significant. I was skeptical of that and have rounded off my admixture run results to the nearest percent.

There was a memory leak issue in the bootstrapping code for admixture which crashed it every time I tried running it. I emailed David Alexander and he fixed it in version 1.12.

So I ran the default 200 bootstrap replicates to measure standard error in our old Reference I K=12 admixture. Spreadsheet with population level results is here and participant results are here.

Here are some statistics for the standard error estimates:

	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
C1 S Asian	0.00%	0.02%	0.33%	0.52%	0.96%	1.93%
C2 Blch/Cauc	0.00%	0.00%	1.02%	0.79%	1.45%	2.63%
C3 Kalash	0.00%	0.01%	0.40%	0.50%	0.99%	3.76%
C4 SE Asian	0.00%	0.09%	0.37%	0.60%	1.27%	1.92%
C5 SW Asian	0.00%	0.00%	0.60%	0.66%	1.28%	2.90%
C6 Euro	0.00%	0.00%	0.35%	0.56%	1.12%	1.82%
C7 Papuan	0.00%	0.07%	0.22%	0.23%	0.36%	1.08%
C8 NE Asian	0.00%	0.07%	0.36%	0.67%	1.36%	2.45%
C9 Siberian	0.00%	0.08%	0.37%	0.51%	0.82%	2.29%
C10 E Bantu	0.00%	0.00%	0.00%	0.35%	0.72%	1.93%
C11W Afr	0.00%	0.00%	0.00%	0.28%	0.50%	1.51%
C12 E Afr	0.00%	0.00%	0.05%	0.31%	0.60%	1.79%

You can see the mean value of the standard errors per population and realize how many are over 1% (marked in red).

And statistics for bias estimates:

	Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
C1 S Asian	-1.104%	-0.031%	0.000%	-0.024%	0.075%	1.026%
C2 Blch/Cauc	-0.835%	-0.280%	-0.009%	-0.133%	0.000%	1.049%
C3 Kalash	-1.575%	0.000%	0.020%	0.076%	0.147%	0.615%
C4 SE Asian	-0.629%	-0.021%	0.011%	0.018%	0.087%	0.478%
C5 SW Asian	-0.691%	-0.094%	0.000%	-0.020%	0.035%	0.613%
C6 Euro	-0.572%	-0.086%	0.000%	-0.039%	0.004%	0.468%
C7 Papuan	-0.171%	0.008%	0.059%	0.070%	0.120%	0.312%
C8 NE Asian	-0.739%	0.000%	0.016%	0.034%	0.107%	0.679%
C9 Siberian	-1.044%	0.000%	0.015%	0.035%	0.103%	0.692%
C10 E Bantu	-0.412%	0.000%	0.000%	-0.007%	0.001%	0.370%
C11 W Afr	-0.261%	0.000%	0.000%	0.009%	0.005%	0.304%
C12 E Afr	-0.635%	0.000%	0.000%	-0.017%	0.010%	0.405%

You can also see the average value of the bias in each ancestral component for each population.

Since the bias is lower than the standard error and distributed around zero, if a large number of samples of a population group have some small percentage of an ancestral component, the likelihood of that not being noise is higher.

Reference 3F(iltered) Admixture

Posted by Zack on May 5, 2011 19 comments

I removed all American populations and San and Pygmy (i.e., South and Central African) from Reference 3 for a better focus on our target populations.

Here are the admixture results. You can choose the number of ancestral components, K, from the dropdown below.

K=13, 14, 15 (in that order) have the lowest cross-validation error.

There's a bunch of interesting results in there. For example, the split into northern and southern European, and the split of Siberian into Siberian and Russian Far East (or Bering Strait). However, the Onge component as a proxy of the ASI does not appear. Also, we don't get much breakdown of the South Asian populations as we would like.

Harappa Nearest IBS Neighbors

Posted by Zack on May 4, 2011 4 comments

After a long tease, here is the spreadsheet containing the top 500 nearest neighbors (using IBS similarity percentages) for the Harappa participants from HRP0001 to HRP0089.

I am also providing an R data object with the same data (except it contains all the 3,975 individual from reference 3 and Harappa). To use this data,

Download R
Install R on your computer
When you start R, type
```
load('harappa_ibs.RData')
```
to load the data
Type
```
closest("HRP0001")
```
to find the 20 closest IBS neighbors of HRP0001. You can use any of the Harappa IDs here.
You can set the number of IBS neighbors (50, for example) to show using
```
closest("HRP0010",50)
```

Enjoy!

100!

Posted by Zack on May 3, 2011 13 comments

Yesterday, we got to 100 participants in the Harappa Ancestry Project.

I made the project public on January 17, 2011. So, 100 submissions in 106 days. That's pretty good.http://ceoec.ru/

I am surprised at the speed and quantity of submissions. I probably have the largest dataset of South Asians right now.

Keep spreading the word and encouraging everyone to participate.

Accepting FTDNA Family Finder

Posted by Zack on May 2, 2011 Comments Off

In addition to 23andme data, I am now accepting the autosomal data from FTDNA Family Finder too.

This is due to the recent switch to Illumina Omni chip by FamilyTreeDNA which has a lot more markers in common with the 23andme data.

Since FTDNA is retesting all its current customers on the new chip, even if you tested with them earlier, you should have autosomal data from the new chip which you can download and email to me at harappa@zackvision.com.

I am basically looking for participants who have at least some ancestry from the following countries/regions:

Afghanistan
Bangladesh
Bhutan
Burma
India
Iran
Maldives
Nepal
Pakistan
Sri Lanka
Tibet

But if you have ancestry from West or Central Asia or Caucasus, I am likely to accept your data too.

Details of participation are here.

April Update

Posted by Zack on May 1, 2011 5 comments

I have a total of 97 participants in the project right now who have sent me their raw data. Six of those have relatives participating and thus have to be filtered out for most analysis other than individual admixture percentages etc where I divide participants into small groups.http://mountainsphoto.ru

The following groups are represented:

Let's try to get to hundred soon.

And yes, I am accepting FTDNA Family Finder (new Illumina chip) now.

Ref3 + Harappa Maps

Posted by Zack on May 1, 2011 Comments Off

More maps from The Jatt Gene using the Reference 3 and Harappa participants K=11 admixture results.

C1 South Asian Isopleth

C2 Onge Isopleth

C1 South Asian Chloropleth at state/province level

C2 Onge Chloropleth

As usual, Simranjit has more maps on his blog.

Harappa Reference 2 IBS Concordance

Posted by Zack on April 30, 2011 14 comments

Vasishta asked:

would it be possible to repeat the same exercise with the Reference II populations? These results seem to be far more plausible for every participant as compared to the previous ones.

Since it took only a few minutes, I calculated the scores as detailed in a previous post from the IBS measures between Harappa participants (1-80 only) and Reference 2.

The spreadsheet is here.

« Previous page | Next page »

Harappa Ancestry Project

Genetics and South Asia

Author Archives: Zack - Page 14

Reference 3 Population Concordance

West, Central, South & Southeast Asian Admixture

Reference I Admixture Errors

Reference 3F(iltered) Admixture

Harappa Nearest IBS Neighbors

100!

Accepting FTDNA Family Finder

April Update

Ref3 + Harappa Maps

Harappa Reference 2 IBS Concordance

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Author Archives: Zack - Page 14

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll