Monthly Archives: April 2011 - Page 2

Reference 3 Admixture K=14

Posted by Zack on April 23, 2011 11 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=14.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

This one I am going to classify as a bad run. The east Asian splits are weird.

Fst divergences between estimated populations for K=14 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13
C2 0.109
C3 0.110 0.160
C4 0.239 0.264 0.247
C5 0.107 0.080 0.161 0.267
C6 0.116 0.111 0.176 0.284 0.102
C7 0.132 0.180 0.092 0.265 0.176 0.195
C8 0.189 0.237 0.214 0.335 0.239 0.251 0.237
C9 0.178 0.206 0.154 0.324 0.192 0.229 0.164 0.294
C10 0.217 0.246 0.191 0.373 0.242 0.262 0.229 0.338 0.285
C11 0.209 0.220 0.248 0.350 0.230 0.223 0.272 0.314 0.312 0.344
C12 0.266 0.278 0.307 0.417 0.286 0.281 0.333 0.373 0.374 0.406 0.179
C13 0.143 0.143 0.186 0.287 0.149 0.135 0.209 0.254 0.247 0.278 0.117 0.177
C14 0.364 0.368 0.410 0.528 0.372 0.377 0.437 0.490 0.481 0.514 0.334 0.359 0.283

This is the last plot I am posting in this series of admixture runs since the crossvalidation error is minimized at K=14.

For some reason, Admixture starts acting weird at values of K higher than about 14-15.

Reference 3 Admixture K=13

Posted by Zack on April 22, 2011 20 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=13.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

The Hadza were expected to split but I thought the San/Pygmy would split first.

Fst divergences between estimated populations for K=13 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
C2 0.093
C3 0.098 0.141
C4 0.179 0.212 0.192
C5 0.100 0.056 0.150 0.224
C6 0.112 0.149 0.075 0.210 0.153
C7 0.109 0.062 0.161 0.234 0.072 0.170
C8 0.181 0.222 0.208 0.279 0.232 0.226 0.239
C9 0.198 0.202 0.239 0.308 0.217 0.254 0.208 0.306
C10 0.164 0.186 0.145 0.276 0.184 0.146 0.217 0.290 0.303
C11 0.320 0.318 0.365 0.443 0.336 0.381 0.325 0.444 0.284 0.437
C12 0.261 0.263 0.302 0.377 0.277 0.318 0.270 0.371 0.153 0.370 0.278
C13 0.137 0.124 0.180 0.248 0.138 0.193 0.121 0.250 0.088 0.241 0.288 0.163

Admixture Onge Component Map

Posted by Zack on April 22, 2011 Comments Off

Since the Onge component on my K=11 admixture run was very strongly correlated with Reich et al's Ancestral South Indian (r^{2Simranjit has been kind enough to let me share his map of the Onge component in South Asia.}

He also has maps of the K=12 admixture run.

Reference 3 Admixture K=12

Posted by Zack on April 22, 2011 3 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=12.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Of course, the K=11 Onge component was too good to last. Onge are too different from the other populations, so of course they get their isolated component.

Fst divergences between estimated populations for K=12 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11
C2 0.089
C3 0.093 0.133
C4 0.172 0.211 0.189
C5 0.103 0.080 0.155 0.234
C6 0.094 0.055 0.140 0.218 0.056
C7 0.113 0.143 0.068 0.213 0.169 0.147
C8 0.179 0.219 0.204 0.280 0.237 0.225 0.228
C9 0.177 0.182 0.214 0.285 0.181 0.187 0.232 0.283
C10 0.164 0.178 0.139 0.276 0.214 0.180 0.143 0.290 0.280
C11 0.151 0.150 0.190 0.260 0.150 0.154 0.207 0.262 0.059 0.255
C12 0.256 0.260 0.295 0.373 0.261 0.265 0.314 0.367 0.116 0.364 0.131

Reference 3 Admixture K=11

Posted by Zack on April 21, 2011 39 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=11.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

You don't know how excited I am to see the Onge (C2) component. Let's compare the Onge component with Reich et al's ASI (Ancestral South Indian):

	Reich ASI %	Onge Component %
Mala	61.2	39.9
Madiga	59.4	37.9
Chenchu	59.3	38.6
Bhil	57.1	37.5
Satnami	57	36.4
Kurumba	56.8	39.5
Kamsali	55.5	35.5
Vysya	53.8	34.4
Lodi	50.1	31.8
Naidu	49.9	32.1
Tharu	49	32.2
Velama	45.3	28.9
Srivastava	43.6	27.8
Meghawal	39.7	25.4
Vaish	37.4	23.8
Kashmiri-Pandit	29.4	17.6
Sindhi	26.3	13.4
Pathan	23.1	10.6

Let's plot that with a linear regression:

How do you like that?

Now let's take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

Fst divergences between estimated populations for K=11 in the form of an MDS plot.

I guess you might want to see the Fst dendrogram too. Just remember it's not a phylogeny.

And the numbers:

	C1	C2	C3	C4	C5	C6	C7	C8	C9	C10
C2	0.165
C3	0.121	0.122
C4	0.090	0.161	0.152
C5	0.071	0.152	0.137	0.048
C6	0.134	0.144	0.067	0.163	0.143
C7	0.184	0.224	0.216	0.179	0.186	0.232
C8	0.210	0.209	0.205	0.235	0.223	0.228	0.286
C9	0.175	0.207	0.139	0.208	0.178	0.141	0.281	0.290
C10	0.261	0.304	0.294	0.257	0.261	0.311	0.123	0.367	0.364
C11	0.150	0.195	0.187	0.143	0.148	0.203	0.059	0.260	0.252	0.133

Reference 3 Admixture K=10

Posted by Zack on April 20, 2011 Comments Off

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=10.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=10 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8 C9
C2 0.110
C3 0.073 0.148
C4 0.090 0.161 0.065
C5 0.185 0.215 0.222 0.234
C6 0.099 0.038 0.138 0.152 0.201
C7 0.112 0.084 0.142 0.163 0.226 0.058
C8 0.166 0.217 0.182 0.171 0.277 0.211 0.225
C9 0.159 0.156 0.183 0.214 0.287 0.133 0.139 0.276
C10 0.233 0.286 0.248 0.243 0.349 0.280 0.295 0.097 0.349

Reference 3 Admixture K=9

Posted by Zack on April 20, 2011 2 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=9.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=9 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7 C8
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.064
C5 0.184 0.201 0.220 0.232
C6 0.113 0.068 0.147 0.166 0.223
C7 0.166 0.210 0.181 0.171 0.275 0.228
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.276
C9 0.233 0.279 0.247 0.243 0.346 0.298 0.096 0.349

Reference 3 Admixture K=8

Posted by Zack on April 20, 2011 21 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=8.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=8 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6 C7
C2 0.098
C3 0.073 0.139
C4 0.090 0.152 0.065
C5 0.184 0.201 0.220 0.231
C6 0.113 0.068 0.147 0.166 0.223
C7 0.164 0.208 0.180 0.170 0.273 0.227
C8 0.158 0.139 0.181 0.212 0.285 0.143 0.275

Behar Bene Israel

Posted by Zack on April 19, 2011 1 comment

As Razib and I were discussing, the four Bnei Menashe Jewish samples from Behar et al didn't look right since Bnei Menashe are from Mizoram in the northeast of India and thus should be expected to have some East Asian admixture.

When I tried to confirm the admixture/PCA results for Bnei Menashe in the Behar et al paper, I didn't find any mention of the group. Instead, the South Asian Jewish group they mentioned was Bene Israel. According to their admixture and PCA results, Bene Israel looked more like Pakistani populations than their Indian host populations. This is consistent with what my admixture runs show.

So I suspected that the four Bene Israel samples mentioned in the Behar et al paper were accidently labeled as Bnei Menashe in the dataset. I sent an email to the authors and they have confirmed that this was the case.

I have corrected all my spreadsheets so you should see Bene Israel instead of Bnei Menashe now. If you spot Bnei Menashe anywhere, please let me know.

PS. Also, it has been confirmed that three Paniya samples were mislabeled when the data was submitted to the GEO database. They are working on fixing it soon.

UPDATE: Mait Metspalu tells me that the database has been updated with the fixed version of the Behar et al dataset.

Reference 3 Admixture K=7

Posted by Zack on April 19, 2011 2 comments

Continuing with the admixture analysis with our new reference 3 dataset.

Here's the results spreadsheet for K=7.

You can click on the legend to the right of the bar chart to sort by different ancestral components.

Fst divergences between estimated populations for K=7 in the form of an MDS plot.

And the numbers:
C1 C2 C3 C4 C5 C6
C2 0.099
C3 0.102 0.150
C4 0.083 0.139 0.062
C5 0.117 0.069 0.164 0.146
C6 0.168 0.208 0.171 0.179 0.227
C7 0.161 0.140 0.209 0.181 0.143 0.274

« Previous page | Next page »

Harappa Ancestry Project

Genetics and South Asia

Monthly Archives: April 2011 - Page 2

Reference 3 Admixture K=14

Reference 3 Admixture K=13

Admixture Onge Component Map

Reference 3 Admixture K=12

Reference 3 Admixture K=11

Reference 3 Admixture K=10

Reference 3 Admixture K=9

Reference 3 Admixture K=8

Behar Bene Israel

Reference 3 Admixture K=7

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Genetics and South Asia

Monthly Archives: April 2011 - Page 2

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll