Tag Archives: reference - Page 9

Reference II Admixture Analysis K=13

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 European
C5 Mediterranean C6 Southeast Asian
C7 Northeast Asian C8 Southwest Asian
C9 Polynesian C10 Papuan
C11 West African C12 East African
C13 Bushman

Fst divergences dendrogram between estimated ancestral populations for K=13:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=12

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Kalash
C3 Southwest Asian C4 European
C5 Southeast Asian C6 Northeast Asian
C7 Papuan C8 Siberian
C9 East Bantu C10 Bushman
C11 East African C12 West African

Fst divergences dendrogram between estimated ancestral populations for K=12:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=11

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southwest Asian
C5 Southeast Asian C6 European
C7 Northeast Asian C8 Papuan
C9 West African C10 Bushman
C11 East African

Yes, I forgot to exclude the !Kung when I removed the San and Pygmy. It's too late to do that now since I don't want to start over with Reference II.

Fst divergences dendrogram between estimated ancestral populations for K=11:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference I Admixture Analysis K=14

Continuing with Reference I admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southeast Asian
C5 European C6 Mediterranean
C7 Melanesian C8 Japanese
C9 Siberian C10 Papuan
C11 West African C12 Southwest Asian
C13 Chinese C14 East African

The Eastern Bantu component is gone. Papuan has split into Papuan and Melanesian. And Northeast Asian has split into Japanese and Chinese.

Fst divergences between estimated populations for K=14:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference I Admixture Analysis K=13

Continuing with Reference I admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Balochistan/Caucasus
C3 Kalash C4 Southeast Asian
C5 European C6 Mediterranean
C7 Papuan C8 Northeast Asian
C9 Siberian C10 Southwest Asian
C11 East African Bantus C12 West African
C13 East African

The new ancestral component at K=13 is C6. I am calling it Mediterranean because it's maximum among Mozabites (63%) and Sardinians (61%) but it is significant all over the Mediterranean.

Northern European groups lose some of their Pakistani/Caucasian component that they had in K=12 and gain in the K=13 European component. Now C5 can probably be called northern European. Most of the new Mediterranean component comes from the K=12 Southwest Asian.

Fst divergences between estimated populations for K=13:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
0.056
0.066 0.062
0.089 0.124 0.136
0.061 0.035 0.066 0.125
0.081 0.062 0.090 0.147 0.047
0.167 0.204 0.219 0.202 0.205 0.226
0.080 0.117 0.128 0.032 0.117 0.140 0.190
0.085 0.114 0.126 0.059 0.113 0.137 0.203 0.039
0.082 0.057 0.094 0.146 0.057 0.057 0.225 0.139 0.138
0.145 0.152 0.177 0.192 0.158 0.163 0.258 0.187 0.191 0.152
0.154 0.162 0.186 0.201 0.168 0.173 0.266 0.195 0.199 0.162 0.014
0.108 0.108 0.136 0.157 0.113 0.114 0.225 0.151 0.154 0.106 0.035 0.041

Dendrogram of the same:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Reference II Admixture Analysis K=10

Continuing with Reference II admixture analysis, here is the results spreadsheet.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

If you can't see the interactive chart above, here's a static image.

C1 South Asian C2 Kalash
C3 Southwest Asian C4 European
C5 Southeast Asian C6 Northeast Asian
C7 Polynesian C8 Papuan
C9 West African C10 East African

Fst divergences dendrogram between estimated ancestral populations for K=10:

PS. This was run using Admixture version 1.04 so I can make an apples-to-apples comparison with the previous runs.

Harappa and Reference I Dendrograms

Looking at the Harappa dendrogram and the dendrogram for reference I, I thought I would combine them to see where our project participants fit.

Then I got more curious. I wanted to see a similarity tree of all the samples in reference I (2,654) plus the 40 Harappa participants I have processed till now. That came out to be such a huge tree it was impossible to save it in a way to be legible. Finally I compromised by selecting only the South Asian samples from the Reference I dataset and putting them together with the Harappa data. Unfortunately, that doesn't give the Iranian and European-admixed participants any information. I'll have to analyze those separately.

Anyway, here's the South Asian Admixture Dendrogram in PDF format. That means you can search for "HRP" to find all the project members, which is why I like PDF in this case better than an image.

Note that Singapore Indians are such a good stand-in for South Indians.

Fst for Reference I Admixture K=12

I had posted the Fst divergences between the estimated ancestral populations for the admixture analysis on Reference I dataset. But a picture is worth a thousand words and this dendrogram (using complete linkage) shows the Fst numbers fairly clearly.

Remember this is not a phylogeny.

Reference I Dendrogram

Handschar created a dendrogram using a hierarchical classifier based on K=12 admixture results and wondered:

When I run a classification based on simple euclidean distances (not a phylogeny), the Armenians and Turks, as they were, prior to the removal of the four North European admixed Behar samples in David's runs, cluster together. The North European component, in Dodecad Armenians, is practically nonexistent. I am not sure how the Harappa project "European" component translates to Dodecad components. If the admixed Armenians are included, it is possible their inclusion is impacting the Armenian population component percentages. Then again, even if included, perhaps your runs are picking up on something not previously detected. The Armenians, in previous classification runs, ordinarily matched one or more of the Caucasian Jewish groups.

While looking into his question, I figured that I would create some dendrograms too. The ones here are based on the K=12 admixture results of Reference I dataset (spreadsheet). Also, I am using the pairwise Euclidean distance of the Admixture results between population groups to do a complete linkage hierarchical classification. So these dendrograms show which groups are closest in terms of their admixture percentages and do not show shared ancestry. In other words, it is not a phylogeny or a family tree.

First, I used the mean admixture percentages for each group, as given in the spreadsheet.

Reference 1 Mean Admixture Complete Linkage Dendrogram

There are a number of outliers in the dataset. For example, some Arabs and Sindhis with African admixture, some Armenians with a lot more European component than the rest, etc. Therefore, I thought a better approach would be to do the same classification using the median admixture percentages for each population group.

Reference 1 Median Admixture Complete Linkage Dendrogram

Using the median sample from each population, handschar was correct that the Armenians match the Caucasian Jewish groups.

UPDATE: Here's another dendrogram in which I take the mean of the ancestral components for each population after removing outliers.

Reference 1 Mean (No Outliers) Admixture Complete Linkage Dendrogram

Again, don't take these dendrograms to heart. All they show is the distance between the admixture results of different populations.

Reference I: Eurasian Subsets

Since we have established that none of the Harappa participants so far have African admixture except for HRP0001 (me) and HRP0027 (Caribbean Indian) and African populations are the most diverse, it's best to remove the African populations from our Reference I dataset and do some analysis using the Eurasian subset.

One option is to exclude the 517 samples of sub-Saharan African populations in our dataset:

  • Bantu Keyna: 11
  • Bantu South Africa: 8
  • Ethiopian Jews: 12
  • Ethiopians: 19
  • Kenyan Luhya: 101
  • Maasai: 135
  • Mandenka: 22
  • African Americans: 48
  • Yoruba: 161

However, in addition to the above, I decided to remove anyone from the reference I dataset who had more than x% African ancestry (sum of East African, East African Bantu and West African) at K=12 admixture run. I created two Eurasian datasets: Eurasian90 and Eurasian95.

Eurasian90 excludes all samples with more than 10% African admixture. That completely removes the following populations in addition to the above:

  • Egyptians: 12
  • Moroccans: 10
  • Mozabite: 29

Also, some samples from the following populations were removed for Eurasian90:

  • Balochi: 3/24
  • Bedouin: 19/46
  • Brahui: 2/25
  • Iranians: 3/19
  • Jordanians: 6/20
  • Lebanese: 2/7
  • Makrani: 3/25
  • Palestinian: 10/46
  • Saudis: 2/20
  • Sindhi: 2/24
  • Syrians: 2/16
  • Yemense: 7/8

That's a total of 629 samples in Reference I dataset that had at least 10% African admixture. Thus Eurasian90 has 2,025 samples. The complete list is here.

The other dataset, Eurasian95 excludes everyone with more than 5% African admixture. Thus in addition to the samples listed above, it excludes the following:

  • Balochi: 1
  • Bedouin: 19
  • Brahui: 1
  • Druze: 1
  • Iranians: 1
  • Jordanians: 14 (completely removed)
  • Makrani: 8
  • Morocco Jews: 2
  • Palestinian: 36 (completely removed)
  • Saudis: 16
  • Sindhi: 2
  • Syrians: 7
  • Yemenese: 1 (completely removed)
  • Yemen Jews: 15 (completely removed)

Eurasian95 is thus left with 1,901 whose breakdown is listed here.

I'll be experimenting with both Eurasian90 and Eurasian95.