Reference II Admixture Analysis K=14

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southwest Asian
C5 European C6 Southeast Asian
C7 Chinese C8 Polynesian
C9 Siberian C10 Papuan
C11 Japanese C12 West African
C13 East African C14 Bushman

Fst divergences dendrogram between estimated ancestral populations for K=14:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=13

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 European
C5 Mediterranean C6 Southeast Asian
C7 Northeast Asian C8 Southwest Asian
C9 Polynesian C10 Papuan
C11 West African C12 East African
C13 Bushman

Fst divergences dendrogram between estimated ancestral populations for K=13:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=12

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Kalash
C3 Southwest Asian C4 European
C5 Southeast Asian C6 Northeast Asian
C7 Papuan C8 Siberian
C9 East Bantu C10 Bushman
C11 East African C12 West African

Fst divergences dendrogram between estimated ancestral populations for K=12:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=11

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southwest Asian
C5 Southeast Asian C6 European
C7 Northeast Asian C8 Papuan
C9 West African C10 Bushman
C11 East African

Yes, I forgot to exclude the !Kung when I removed the San and Pygmy. It's too late to do that now since I don't want to start over with Reference II.

Fst divergences dendrogram between estimated ancestral populations for K=11:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference I Admixture Analysis K=14

Continuing with Reference I admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southeast Asian
C5 European C6 Mediterranean
C7 Melanesian C8 Japanese
C9 Siberian C10 Papuan
C11 West African C12 Southwest Asian
C13 Chinese C14 East African

The Eastern Bantu component is gone. Papuan has split into Papuan and Melanesian. And Northeast Asian has split into Japanese and Chinese.

Fst divergences between estimated populations for K=14:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference I Admixture Analysis K=13

Continuing with Reference I admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southeast Asian
C5 European C6 Mediterranean
C7 Papuan C8 Northeast Asian
C9 Siberian C10 Southwest Asian
C11 East African Bantus C12 West African
C13 East African

The new ancestral component at K=13 is C6. I am calling it Mediterranean because it's maximum among Mozabites (63%) and Sardinians (61%) but it is significant all over the Mediterranean.

Northern European groups lose some of their Pakistani/Caucasian component that they had in K=12 and gain in the K=13 European component. Now C5 can probably be called northern European. Most of the new Mediterranean component comes from the K=12 Southwest Asian.

Fst divergences between estimated populations for K=13:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
0.056
0.066 0.062
0.089 0.124 0.136
0.061 0.035 0.066 0.125
0.081 0.062 0.090 0.147 0.047
0.167 0.204 0.219 0.202 0.205 0.226
0.080 0.117 0.128 0.032 0.117 0.140 0.190
0.085 0.114 0.126 0.059 0.113 0.137 0.203 0.039
0.082 0.057 0.094 0.146 0.057 0.057 0.225 0.139 0.138
0.145 0.152 0.177 0.192 0.158 0.163 0.258 0.187 0.191 0.152
0.154 0.162 0.186 0.201 0.168 0.173 0.266 0.195 0.199 0.162 0.014
0.108 0.108 0.136 0.157 0.113 0.114 0.225 0.151 0.154 0.106 0.035 0.041

Dendrogram of the same:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Chromosomal Admixture Painting

You have likely seen the 23andme ancestry painting and probably Doug McDonald's chromosome painting too. This is not quite the same.

Instead of looking at ancestry of segments, I am looking at ancestry of whole chromosomes. My curiosity for this analysis derives from my lovingly homozygous chromosome 9 and a-little-African chromosome 8.

Basically, this is the same Admixture analysis with Reference I dataset and a batch of Harappa project participants, except that instead of using all 22 chromosomes, I ran admixture separately on each chromosome.

Since each chromosome's data was separately processed, the ancestral components inferred for each chromosome are not exactly the same. In practice, at K=4 ancestral components, they stayed reasonably constant, but the errors are larger than the overall autosomal admixture analysis. So you should be wary of assigning significance to minor changes in percentages from chromosome to chromosome. As a rule of thumb, a difference of greater than 5% from the autosomal should be required for you to give it some thought.

I have run the same admixture analysis with K=6 for this group. Once I have analyzed that, I'll write about it. If you guys think it's something worth doing, then I can run the same analysis for the latter batches. Otherwise, we can look directly into proper chromosomal segment painting.

Reference II Admixture Analysis K=10

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Kalash
C3 Southwest Asian C4 European
C5 Southeast Asian C6 Northeast Asian
C7 Polynesian C8 Papuan
C9 West African C10 East African

Fst divergences dendrogram between estimated ancestral populations for K=10:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Harappa and Reference I Dendrograms

Looking at the Harappa dendrogram and the dendrogram for reference I, I thought I would combine them to see where our project participants fit.

Then I got more curious. I wanted to see a similarity tree of all the samples in reference I (2,654) plus the 40 Harappa participants I have processed till now. That came out to be such a huge tree it was impossible to save it in a way to be legible. Finally I compromised by selecting only the South Asian samples from the Reference I dataset and putting them together with the Harappa data. Unfortunately, that doesn't give the Iranian and European-admixed participants any information. I'll have to analyze those separately.

Anyway, here's the South Asian Admixture Dendrogram in PDF format. That means you can search for "HRP" to find all the project members, which is why I like PDF in this case better than an image.

Note that Singapore Indians are such a good stand-in for South Indians.

Harappa Admixture Dendrogram

Using the ancestral component percentages from the Admixture run at K=12 for Harappa Project participants, we can calculate the pairwise Euclidean distance between them. These distances can be used to create complete linkage (i.e. furthest neighbor) hierarchical clustering, which you see below.

Note that this is not a phylogeny. It just visualizes the closeness of your admixture results to others.

Thus in terms of admixture results, the Punjabis mostly cluster together along with the Rajasthani (HRP0033), except for my family (HRP0001 and HRP0035) who cluster (not so closely) with the Sindhi-Balochi guy (HRP0039) likely due to the Southwest Asian and African components.

Interestingly, the Bihari Brahmin (HRP0003) is very different from the Bihari Kayastha participant (HRP0032). The Caribbean Indian samples (HRP0027 & HRP0028) cluster with the Bihari Kayastha, so we can't really say for sure where from India their ancestors originated from.

The South Indian Brahmin samples seem to vary consistently from the non-Brahmin ones.

The Iranians cluster closely except for the Khorasanian HRP0034 and Assyrian HRP0010. The Assyrian Iranian sample is actually closer to the Iraqi/Egyptian Jewish sample (HRP0037) than to other Iranians.

The participants with recent European admixture cluster very loosely with each other. Other techniques will need to be used to pinpoint their specific South Asian origins.

If we make a cut at about 0.3 on this tree, we get 3 South Asian clusters:

  • the Northwest of South Asia
  • South Indian Brahmins, Bihari Brahmin, UP Brahmin
  • South Indian non-Brahmin, Bihari non-Brahmin, Bengalis, Caribbean Indians

I wish I had a thousand South Asian samples to play with. I wonder how this dendrogram would look in that case.