Reference 3 Admixture K=12

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=12.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Of course, the K=11 Onge component was too good to last. Onge are too different from the other populations, so of course they get their isolated component.

Fst divergences between estimated populations for K=12 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C2 0.089
C3 0.093 0.133
C4 0.172 0.211 0.189
C5 0.103 0.080 0.155 0.234
C6 0.094 0.055 0.140 0.218 0.056
C7 0.113 0.143 0.068 0.213 0.169 0.147
C8 0.179 0.219 0.204 0.280 0.237 0.225 0.228
C9 0.177 0.182 0.214 0.285 0.181 0.187 0.232 0.283
C10 0.164 0.178 0.139 0.276 0.214 0.180 0.143 0.290 0.280
C11 0.151 0.150 0.190 0.260 0.150 0.154 0.207 0.262 0.059 0.255
C12 0.256 0.260 0.295 0.373 0.261 0.265 0.314 0.367 0.116 0.364 0.131

Reference 3 Admixture K=11

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

Reich ASI % Onge Component %
Mala 61.2 39.9
Madiga 59.4 37.9
Chenchu 59.3 38.6
Bhil 57.1 37.5
Satnami 57 36.4
Kurumba 56.8 39.5
Kamsali 55.5 35.5
Vysya 53.8 34.4
Lodi 50.1 31.8
Naidu 49.9 32.1
Tharu 49 32.2
Velama 45.3 28.9
Srivastava 43.6 27.8
Meghawal 39.7 25.4
Vaish 37.4 23.8
Kashmiri-Pandit 29.4 17.6
Sindhi 26.3 13.4
Pathan 23.1 10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C2 0.165
C3 0.121 0.122
C4 0.090 0.161 0.152
C5 0.071 0.152 0.137 0.048
C6 0.134 0.144 0.067 0.163 0.143
C7 0.184 0.224 0.216 0.179 0.186 0.232
C8 0.210 0.209 0.205 0.235 0.223 0.228 0.286
C9 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.290
C10 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364
C11 0.150 0.195 0.187 0.143 0.148 0.203 0.059 0.260 0.252 0.133

Reference 3 Admixture K=10

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=10.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=10 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9
C2 0.110
C3 0.073 0.148
C4 0.090 0.161 0.065
C5 0.185 0.215 0.222 0.234
C6 0.099 0.038 0.138 0.152 0.201
C7 0.112 0.084 0.142 0.163 0.226 0.058
C8 0.166 0.217 0.182 0.171 0.277 0.211 0.225
C9 0.159 0.156 0.183 0.214 0.287 0.133 0.139 0.276
C10 0.233 0.286 0.248 0.243 0.349 0.280 0.295 0.097 0.349

Reference 3 Admixture K=9

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=9.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=9 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.064
C5 0.184 0.201 0.220 0.232
C6 0.113 0.068 0.147 0.166 0.223
C7 0.166 0.210 0.181 0.171 0.275 0.228
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.276
C9 0.233 0.279 0.247 0.243 0.346 0.298 0.096 0.349

Reference 3 Admixture K=8

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=8.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=8 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.065
C5 0.184 0.201 0.220 0.231
C6 0.113 0.068 0.147 0.166 0.223
C7 0.164 0.208 0.180 0.170 0.273 0.227
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.275

Behar Bene Israel

As Razib and I were discussing, the four Bnei Menashe Jewish samples from Behar et al didn't look right since Bnei Menashe are from Mizoram in the northeast of India and thus should be expected to have some East Asian admixture.

When I tried to confirm the admixture/PCA results for Bnei Menashe in the Behar et al paper, I didn't find any mention of the group. Instead, the South Asian Jewish group they mentioned was Bene Israel. According to their admixture and PCA results, Bene Israel looked more like Pakistani populations than their Indian host populations. This is consistent with what my admixture runs show.

So I suspected that the four Bene Israel samples mentioned in the Behar et al paper were accidently labeled as Bnei Menashe in the dataset. I sent an email to the authors and they have confirmed that this was the case.

I have corrected all my spreadsheets so you should see Bene Israel instead of Bnei Menashe now. If you spot Bnei Menashe anywhere, please let me know.

PS. Also, it has been confirmed that three Paniya samples were mislabeled when the data was submitted to the GEO database. They are working on fixing it soon.

UPDATE: Mait Metspalu tells me that the database has been updated with the fixed version of the Behar et al dataset.

Reference 3 Admixture K=7

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=7.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=7 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6
C2 0.099
C3 0.102 0.150
C4 0.083 0.139 0.062
C5 0.117 0.069 0.164 0.146
C6 0.168 0.208 0.171 0.179 0.227
C7 0.161 0.140 0.209 0.181 0.143 0.274

Reference 3 Admixture K=6

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=6.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=6 in the form of MDS plots.

And the numbers:
C1 C2 C3 C4 C5
C2 0.088
C3 0.085 0.132
C4 0.097 0.145 0.067
C5 0.165 0.203 0.182 0.171
C6 0.154 0.128 0.176 0.205 0.269

Reference 3 Admixture K=5

UPDATE: With fixed Reference 3.

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=5.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=5 in the form of MDS plots.

And the numbers:
C1 C2 C3 C4
C2 0.078
C3 0.088 0.126
C4 0.153 0.176 0.127
C5 0.160 0.165 0.201 0.266

Reference 3 Admixture K=4

UPDATE: With fixed Reference 3.

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=4.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

I have implemented something in this bar chart. When you click on the legend to sort by a specific component, it filters the results so that only those populations with at least 5% of that component are shown. Let me know what you think about this.

Fst divergences between estimated populations for K=4 in the form of an MDS plot.

And the numbers:
C1 C2 C3
C2 0.120
C3 0.158 0.198
C4 0.168 0.124 0.267