Mithra asked:
Almost all the Chinese are now around 50% SE Asian, didn’t see this before is it right.
So I decided to look at the Chinese samples in Reference I dataset.
I ran Admixture on the whole Reference I dataset for K=10 ancestral populations. The green component is what I call Southeast Asian, blue is Northeast Asian (highest among the Japanese) and violet is Siberian (highest among the Yakut).
Here is the plot for the 106 HapMap Chinese samples from Denver (label: us chinese):
HapMap US Chinese
For the 137 HapMap samples from Beijing, China (label: han chinese):
HapMap Han Chinese
For the 34 HGDP Han samples (label: han):
HGDP Han
For the 10 HGDP Han samples from North China (label: han-nchina):
HGDP Han North China
As you can see, the "Southeast Asian" component goes down from the top group to the bottom one, which is as expected.
I wasn't satisfied with these results, so I decided to run Admixture on the East Asian samples in Reference I separately.
East Asian Admixture K=3
At K=3, the results are about the same as at K=10 for the whole reference I population. The Han all have a significant amount of blue component which is highest among the Southeast Asians.
East Asian Admixture K=4
At K=4, we get a Chinese ("East Asian") component. So we have Japanese, Chinese, Yakut and Southeast Asian components. This is what most of you were probably expecting.
Why did the Japanese become the modal population for the Northeast Asian component? I ran a PCA on the East Asian data to see how the different populations looked on a PCA plot. Remember that eigenvector 1 explains 1.49 times the variance of eigenvector 2 and 1.9 times the variance of eigenvector 3. Thus, eigenvector 2 explains 1.28 times the variation explained by eigenvector 3.
East Asian PCA eig1 vs eig2
East Asian PCA eig1 vs eig3
East Asian PCA eig2 vs eig3
As you can see, the Yakut are the far away, but the Japanese are also fairly well-separated from the Chinese populations.
If I didn't have the 141 Japanese samples in my reference dataset, the Northeast Asian component would be centered on the Han most likely, which is the case for Dodecad.
I think this shows that it is not correct to think of the ancestral components inferred from admixture as some pure ancestral population.
Recent Comments