A week later, some more Admixture analysis of Reference I dataset.
As usual, the results are available in a spreadsheet, which is also listed on my sidebar.
Let's start with K=10.
C1 | South Asian | C2 | Kalash |
---|---|---|---|
C3 | Southwest Asian | C4 | Southeast Asian |
C5 | European | C6 | Papuan |
C7 | Northeast Asian | C8 | Siberian |
C9 | West African | C10 | East African |
The addition here is basically of the Siberian component which is highest among the Yakut.
Fst divergences between estimated populations for K=10:
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | |
---|---|---|---|---|---|---|---|---|---|
C2 | 0.057 | ||||||||
C3 | 0.064 | 0.073 | |||||||
C4 | 0.089 | 0.127 | 0.136 | ||||||
C5 | 0.063 | 0.061 | 0.038 | 0.131 | |||||
C6 | 0.167 | 0.209 | 0.215 | 0.202 | 0.210 | ||||
C7 | 0.080 | 0.120 | 0.129 | 0.032 | 0.123 | 0.190 | |||
C8 | 0.085 | 0.117 | 0.127 | 0.059 | 0.118 | 0.203 | 0.039 | ||
C9 | 0.152 | 0.174 | 0.161 | 0.201 | 0.171 | 0.266 | 0.195 | 0.199 | |
C10 | 0.115 | 0.133 | 0.117 | 0.166 | 0.128 | 0.233 | 0.160 | 0.163 | 0.036 |
Now for K=11,
C1 | South Asian | C2 | Kalash |
---|---|---|---|
C3 | Southwest Asian | C4 | Southeast Asian |
C5 | European | C6 | Papuan |
C7 | Siberian | C8 | Northeast Asian |
C9 | East African Bantus | C10 | West African |
C11 | East African |
C8 at K=11 is now modal among the Han instead of the Japanese. This affected the Southeast Asian C4 component which is now more of a real Southeast Asian one.
The new ancestral component C9 is among the Bantus of eastern and southern Africa. It is highest among the Luhya and Bantus of Kenya.
Fst divergences between estimated populations for K=11:
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | |
---|---|---|---|---|---|---|---|---|---|---|
C2 | 0.055 | |||||||||
C3 | 0.062 | 0.072 | ||||||||
C4 | 0.081 | 0.120 | 0.128 | |||||||
C5 | 0.063 | 0.063 | 0.038 | 0.124 | ||||||
C6 | 0.169 | 0.211 | 0.215 | 0.195 | 0.213 | |||||
C7 | 0.089 | 0.128 | 0.135 | 0.057 | 0.130 | 0.203 | ||||
C8 | 0.083 | 0.122 | 0.131 | 0.031 | 0.127 | 0.194 | 0.039 | |||
C9 | 0.143 | 0.165 | 0.150 | 0.185 | 0.162 | 0.259 | 0.195 | 0.189 | ||
C10 | 0.152 | 0.174 | 0.160 | 0.194 | 0.172 | 0.268 | 0.203 | 0.198 | 0.014 | |
C11 | 0.104 | 0.122 | 0.101 | 0.149 | 0.115 | 0.226 | 0.158 | 0.152 | 0.037 | 0.043 |
At K=12,
C1 | South Asian | C2 | Balochistan/Caucasus |
---|---|---|---|
C3 | Kalash | C4 | Southeast Asian |
C5 | Southwest Asian | C6 | European |
C7 | Papuan | C8 | Northeast Asian |
C9 | Siberian | C10 | East African Bantus |
C11 | West African | C12 | East African |
The Kalash component has split, with an assist from Southwest Asian, into a pure Kalash component (C3) and a Balochistan/Caucasus (C2) which is highest in Southwestern Pakistan (Brahui, Makrani, Balochi) at 60-57% followed by Georgians, Lezgin, Adeygei, Azerbaijan Jews and Iranian Jews (56-50%).
The Southwest Asian component (C5) is now more of a Southwest Asian and North/Northwest African component. The West Asian element in it has been reduced.
The Northeast Asian component (C8) is now again centered on Japan. I have a solution for this movement which I'll apply in my next round of analysis.
Fst divergences between estimated populations for K=12:
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9 | C10 | C11 | |
---|---|---|---|---|---|---|---|---|---|---|---|
C2 | 0.057 | ||||||||||
C3 | 0.066 | 0.060 | |||||||||
C4 | 0.089 | 0.124 | 0.136 | ||||||||
C5 | 0.075 | 0.057 | 0.087 | 0.142 | |||||||
C6 | 0.066 | 0.040 | 0.073 | 0.130 | 0.048 | ||||||
C7 | 0.167 | 0.205 | 0.219 | 0.202 | 0.220 | 0.210 | |||||
C8 | 0.080 | 0.117 | 0.128 | 0.032 | 0.134 | 0.122 | 0.190 | ||||
C9 | 0.085 | 0.114 | 0.126 | 0.059 | 0.133 | 0.117 | 0.203 | 0.039 | |||
C10 | 0.145 | 0.154 | 0.176 | 0.192 | 0.154 | 0.162 | 0.258 | 0.187 | 0.190 | ||
C11 | 0.154 | 0.163 | 0.186 | 0.201 | 0.164 | 0.172 | 0.266 | 0.195 | 0.199 | 0.014 | |
C12 | 0.107 | 0.109 | 0.135 | 0.157 | 0.105 | 0.116 | 0.225 | 0.151 | 0.154 | 0.035 | 0.041 |
Higher K value admixture analysis will continue.
Is C2 the "Dagestani" component?
Since it's a little higher in southwestern Pakistan than in Daghestan, the label Daghestani is not as appropriate in my opinion.
It seems similar to Dodecad's Daghestani component but I think this one is higher among the Punjabis etc than Daghestani component was.
A projection of the results geographically in contours style for selected admixture components would be incredibly useful to determine labels.
I agree. If I can figure out a way it would be great.
I know I can do country-level maps easily but we need some better detail in South Asia.
Anyone know of any software we can use to do gradient maps of the world?
I think MATLAB's Mapping Toolbox can do that: http://www.mathworks.com/products/mapping/
Thanks! Unfortunately I don't have Matlab at home. So I am looking at R's mapping libraries. It'll take some effort to associate the ethnicities with different regions but a map will be ready one day. 🙂
At K=12, the European and Pakistani/Caucasian components have one of the lowest Fst divergences on the table. The only one I see lower is the one between the East African and East African Bantu components.
Oh, and K=12 would be a good K value for future runs using project participants, since the presence of the Pakistani/Caucasian component fits the project's focus.
Already on it. 🙂
Zack I'm trying to tie this in with thess pieces.
http://blogs.discovermagazine.com/gnxp/2010/12/some-of-the-indo-europeans-found/
http://blogs.discovermagazine.com/gnxp/2010/12/south-asians-too-are-sons-of-the-farmers/
I am also have a sepia mutiny discussion on genes with Razib; just to clarify I'm not versed in the science so I just like skimming through the analysis.
I'm aiming for a cohesive narrative but then I probably will be making myself more confused since I don't understand many of the constituent parts. We need more theories people and random speculations lol 😛
If you look at Dienekes's bar plot which has the Dagestani component, you'll notice that it peaks among the Lezgin and is fairly low among the Baloch.