Monthly Archives: February 2011 - Page 4

Admixture: Note on Precision

As you might have seen in the spreadsheets for the reference and for participants, I am rounding off the percentages to the nearest integer.

There is a reason for that. For one thing, there are lots of factors that can influence these results. If I choose somewhat different reference samples, the ancestral components as well as their proportions in different individuals would vary from the current case. This is especially true for minor ancestral components.

I am running admixture for project participants in batches of 10 along with all of my reference dataset. Thus I am be sure that the ancestral components inferred stay the same from one batch to the next.

While the percentages do not vary much for the reference samples from one admixture run to another (with different project participant samples), they do change a little. And I have seen a few changes by as much as 1-2%.

Therefore, these ancestry percentages are at most accurate up to the nearest whole number. There is absolutely no difference between 11.7% and 12.4% for example in my opinion.

Admixture K=2-5, HRP0001 to HRP0010

Finally, it's time to analyze the genomes of project participants. Admixture analysis is going to be done in batches of ten so that the ancestral components are stable from one run to another.

My choice of calling them "ancestral components" is deliberate. Please do not think of them as pure ancestral populations.

First, the ethnic background of the participants in this batch. I'll give the ethnicity only if I have explicit permission from the participant to make such information public. By default, I assume it to be private. Here's the summary:

Ethnicity Count
Punjab 5
Bengal 1
Bihar 1
Tamil 1
Andhra Pradesh 1
Iran 1

Since this is the first batch, I am running admixture for all values of K to get a better handle on how things shake out. With later batches, I will run only a few specific values of K since admixture takes a long time to run.

The ancestral component percentages for project participants can be found in this spreadsheet.

It might be good to refer to the admixture runs for the reference (spreadsheet) to get a better idea of what the different ancestral components represent.

Let's start with K=2 ancestral components.

Batch 1 Admixture K=2

Cyan/African (C2) component varies from 29-51% among participants which is about what you would expect from the results for South Asian reference populations.

With K=3 where the ancestral components roughly represent European (C1/red), East Asian (C2/green) and African (C3/blue), we see the following:

Batch 1 Admixture K=3

I am HRP0001 and my number for K=3 are 77% European, 18% Asian and 5% African. This contrasts with my 23andme ancestry painting of 91.22% European, 8.69% Asian and 0.09% African. However, HRP0002 has closer numbers:

HRP0002 European Asian African
HAP 55% 43% 1%
23andme 57% 43% 0%

We (HAP) are using a much more diverse reference population while 23andme ancestry painting is based on the basic three populations of HapMap. Also, since I am a quarter Egyptian, the likelihood of some African ancestry is high in my case.

Note that the Asian (C2) percentages vary from 18% to 44% for the South Asians in this batch, but it's low (18-22%) in Punjabis and higher in southern and eastern South Asians. It's almost negligible in our Iranian Assyrian sample.

With K=4, we finally get our South Asian ancestral component (C1/red).

Batch 1 Admixture K=4

I (HRP0001) am the only one with any noticeable African component (C4/violet) while HRP0002 has some East Asian ancestry (C3/cyan). The two South Indians have lower European component (C2/green) along with HRP0002 who is from East Bengal.

Finally, let's take a look at K=5 ancestral components.

Batch 1 Admixture K=5

The South Asian (C1/red), East Asian (C3/green) and African (C5/magenta) components are about the same as in K=4. The new component here is C4/blue, which is the Southwest/West Asian component. This is basically a split from the K=4 European (C2/yellow) component. Our Assyrian sample has the highest Southwest/West Asian component while I also have it higher than the South Asians due to my quarter Egyptian ancestry.

Let's continue higher values of K next time.

Reference Admixture Analysis K=2-5

Let's do admixture analysis on my reference population.

Since I wasn't sure what value of K would be appropriate, I ran admixture with different values of K, which defines the number of ancestral populations.

The proportion of ancestral populations for each ethnic group is given in this spreadsheet. These are the mean values for that group, calculated by averaging the ancestral proportion across all the samples belonging to that group. I have also calculated the standard deviation across each ethnic group and that's included in the spreadsheet. The higher values of standard deviation are highlighted in blue (>1%) and red (>5%). Those population groups have samples that have somewhat different ancestries.

Let's start with two ancestral populations, i.e. K = 2.

Admixture: Reference populations K=2

The second ancestral component C2 (cyan) seems to be African and the 1st one C1 (red) is maximum among East Asians. Since all populations are constrained to be made of these two ancestral components, Europeans, Middle Easterners and South Asians all have about half African ancestral component (C2) and the rest East Asian (C1). This is as I expected with the classification of humanity into African and non-African.

The Fst divergences between estimated ancestral populations are as follows:

C1
C2 0.157

The K=3 analysis ancestral components can be roughly said to be European, East Asian and African.

Admixture: Reference populations K=3

The component C1 (red) is maximum among Europeans and is the major ancestry component for Middle Easterners, Central Asians and South Asians. Ancestral component C2 (green) is East Asian. South Asians also have a significant fraction of C2. African populations are represented by C3 (blue). Yemenese, Mozabits and Ethiopian Jews also have appreciable proportions of this African ancestral component.

Looking at the standard deviations of ancestral components for our sample groups, we see that while the Bedouin, Jordanians, Makrani, Moroccons, Mozabite, Saudis and Yemenese are mostly West Eurasian, their proportion of African ancestry vary quite a bit. The large standard deviation in Paniya is due to one sample (C1=55%, C2=42%, C3=3%) being very different (i.e. much more West Eurasian) from the other three (C1=11%, C2=85%, C3=4%).

There are also a couple of Sindhis with some African admixture. These are possibly partly or wholly Siddi.

HGDP Sindhi Samples Admixture K=3

Fst divergences between estimated populations for K=3:

C1 C2
C2 0.102
C3 0.144 0.182

With four ancestral components (K=4), component C1 (red) is a South Asian ancestral component. It is maximum among central and south Indians as well as among Papuans and Melanesians. It could thus possibly related to the ASI (Ancestral South Indian) component. C4 (violet) is the African component. C3 (cyan) is the East Asian component and C2 (green) is the European component.

Admixture: Reference populations K=4

Fst divergences between estimated populations for K=4:

C1 C2 C3
C2 0.071
C3 0.083 0.109
C4 0.152 0.152 0.184

When we increase K to 5, we get the following graph:

Admixture: Reference populations K=5

Ancestral component C1 (red) is Austronesian/South Asian. It is maximum among the Papuans at 75% and is higher among South Indians as compared to Pakistanis. It is about the same component as C1 in K=4.

C4 (blue) is Southwest Asian/West Asian. It peaks in Yemeni Jews at 66% and is high among Saudis, Bedouin, Samaritans, Egyptians, and Palestinians. It's 32% among Turks, so the Southwest Asian part is dominating the West Asian in this component. Notice how Ethiopians and Ethiopian jews have about half of their ancestry from this component.

C3 (green) is the East Asian component and is the same as C3 in the K=4 analysis.

C5 (magenta) is the African ancestry component and is about the same as C4 in the K=4 analysis.

C2 (yellow) is the European component. In K=4, the European component was high among both southern and northern Europeans. Now in K=5, we have the C4 (Southwest/West Asian) component among southern Europeans, so this European component has taken on more of a north European outlook.

Fst divergences between estimated populations for K=5:

C1 C2 C3 C4
C2 0.081
C3 0.084 0.114
C4 0.085 0.054 0.129
C5 0.154 0.165 0.186 0.155

Let's continue this admixture analysis for higher values of K.