Monthly Archives: April 2011 - Page 2

Reference 3 Admixture K=14

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=14.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

This one I am going to classify as a bad run. The east Asian splits are weird.

Fst divergences between estimated populations for K=14 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13
C2 0.109
C3 0.110 0.160
C4 0.239 0.264 0.247
C5 0.107 0.080 0.161 0.267
C6 0.116 0.111 0.176 0.284 0.102
C7 0.132 0.180 0.092 0.265 0.176 0.195
C8 0.189 0.237 0.214 0.335 0.239 0.251 0.237
C9 0.178 0.206 0.154 0.324 0.192 0.229 0.164 0.294
C10 0.217 0.246 0.191 0.373 0.242 0.262 0.229 0.338 0.285
C11 0.209 0.220 0.248 0.350 0.230 0.223 0.272 0.314 0.312 0.344
C12 0.266 0.278 0.307 0.417 0.286 0.281 0.333 0.373 0.374 0.406 0.179
C13 0.143 0.143 0.186 0.287 0.149 0.135 0.209 0.254 0.247 0.278 0.117 0.177
C14 0.364 0.368 0.410 0.528 0.372 0.377 0.437 0.490 0.481 0.514 0.334 0.359 0.283

This is the last plot I am posting in this series of admixture runs since the crossvalidation error is minimized at K=14.

For some reason, Admixture starts acting weird at values of K higher than about 14-15.

Reference 3 Admixture K=13

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=13.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

The Hadza were expected to split but I thought the San/Pygmy would split first.

Fst divergences between estimated populations for K=13 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
C2 0.093
C3 0.098 0.141
C4 0.179 0.212 0.192
C5 0.100 0.056 0.150 0.224
C6 0.112 0.149 0.075 0.210 0.153
C7 0.109 0.062 0.161 0.234 0.072 0.170
C8 0.181 0.222 0.208 0.279 0.232 0.226 0.239
C9 0.198 0.202 0.239 0.308 0.217 0.254 0.208 0.306
C10 0.164 0.186 0.145 0.276 0.184 0.146 0.217 0.290 0.303
C11 0.320 0.318 0.365 0.443 0.336 0.381 0.325 0.444 0.284 0.437
C12 0.261 0.263 0.302 0.377 0.277 0.318 0.270 0.371 0.153 0.370 0.278
C13 0.137 0.124 0.180 0.248 0.138 0.193 0.121 0.250 0.088 0.241 0.288 0.163

Admixture Onge Component Map

Since the Onge component on my K=11 admixture run was very strongly correlated with Reich et al's Ancestral South Indian (r2Simranjit has been kind enough to let me share his map of the Onge component in South Asia.

He also has maps of the K=12 admixture run.

Reference 3 Admixture K=12

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=12.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Of course, the K=11 Onge component was too good to last. Onge are too different from the other populations, so of course they get their isolated component.

Fst divergences between estimated populations for K=12 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C2 0.089
C3 0.093 0.133
C4 0.172 0.211 0.189
C5 0.103 0.080 0.155 0.234
C6 0.094 0.055 0.140 0.218 0.056
C7 0.113 0.143 0.068 0.213 0.169 0.147
C8 0.179 0.219 0.204 0.280 0.237 0.225 0.228
C9 0.177 0.182 0.214 0.285 0.181 0.187 0.232 0.283
C10 0.164 0.178 0.139 0.276 0.214 0.180 0.143 0.290 0.280
C11 0.151 0.150 0.190 0.260 0.150 0.154 0.207 0.262 0.059 0.255
C12 0.256 0.260 0.295 0.373 0.261 0.265 0.314 0.367 0.116 0.364 0.131

Reference 3 Admixture K=11

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

Reich ASI % Onge Component %
Mala 61.2 39.9
Madiga 59.4 37.9
Chenchu 59.3 38.6
Bhil 57.1 37.5
Satnami 57 36.4
Kurumba 56.8 39.5
Kamsali 55.5 35.5
Vysya 53.8 34.4
Lodi 50.1 31.8
Naidu 49.9 32.1
Tharu 49 32.2
Velama 45.3 28.9
Srivastava 43.6 27.8
Meghawal 39.7 25.4
Vaish 37.4 23.8
Kashmiri-Pandit 29.4 17.6
Sindhi 26.3 13.4
Pathan 23.1 10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C2 0.165
C3 0.121 0.122
C4 0.090 0.161 0.152
C5 0.071 0.152 0.137 0.048
C6 0.134 0.144 0.067 0.163 0.143
C7 0.184 0.224 0.216 0.179 0.186 0.232
C8 0.210 0.209 0.205 0.235 0.223 0.228 0.286
C9 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.290
C10 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364
C11 0.150 0.195 0.187 0.143 0.148 0.203 0.059 0.260 0.252 0.133

Reference 3 Admixture K=10

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=10.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=10 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9
C2 0.110
C3 0.073 0.148
C4 0.090 0.161 0.065
C5 0.185 0.215 0.222 0.234
C6 0.099 0.038 0.138 0.152 0.201
C7 0.112 0.084 0.142 0.163 0.226 0.058
C8 0.166 0.217 0.182 0.171 0.277 0.211 0.225
C9 0.159 0.156 0.183 0.214 0.287 0.133 0.139 0.276
C10 0.233 0.286 0.248 0.243 0.349 0.280 0.295 0.097 0.349

Reference 3 Admixture K=9

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=9.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=9 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.064
C5 0.184 0.201 0.220 0.232
C6 0.113 0.068 0.147 0.166 0.223
C7 0.166 0.210 0.181 0.171 0.275 0.228
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.276
C9 0.233 0.279 0.247 0.243 0.346 0.298 0.096 0.349

Reference 3 Admixture K=8

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=8.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=8 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.065
C5 0.184 0.201 0.220 0.231
C6 0.113 0.068 0.147 0.166 0.223
C7 0.164 0.208 0.180 0.170 0.273 0.227
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.275

Behar Bene Israel

As Razib and I were discussing, the four Bnei Menashe Jewish samples from Behar et al didn't look right since Bnei Menashe are from Mizoram in the northeast of India and thus should be expected to have some East Asian admixture.

When I tried to confirm the admixture/PCA results for Bnei Menashe in the Behar et al paper, I didn't find any mention of the group. Instead, the South Asian Jewish group they mentioned was Bene Israel. According to their admixture and PCA results, Bene Israel looked more like Pakistani populations than their Indian host populations. This is consistent with what my admixture runs show.

So I suspected that the four Bene Israel samples mentioned in the Behar et al paper were accidently labeled as Bnei Menashe in the dataset. I sent an email to the authors and they have confirmed that this was the case.

I have corrected all my spreadsheets so you should see Bene Israel instead of Bnei Menashe now. If you spot Bnei Menashe anywhere, please let me know.

PS. Also, it has been confirmed that three Paniya samples were mislabeled when the data was submitted to the GEO database. They are working on fixing it soon.

UPDATE: Mait Metspalu tells me that the database has been updated with the fixed version of the Behar et al dataset.

Reference 3 Admixture K=7

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=7.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=7 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6
C2 0.099
C3 0.102 0.150
C4 0.083 0.139 0.062
C5 0.117 0.069 0.164 0.146
C6 0.168 0.208 0.171 0.179 0.227
C7 0.161 0.140 0.209 0.181 0.143 0.274