I have been working on creating 100% ASI (Ancestral South Indian) samples recently. So it was really interesting that Dienekes did similar experiments:
- How to create Zombies from ADMIXTURE etc.
- More Zombies: Ancestral North Indians and Ancestral South Indians reborn
- ANI/ASI analysis of HGDP Pakistan groups
I am going about creating the "pure" allele frequencies somewhat differently, so that would be a useful exercise.
Anyway, I thought you guys would be itching for some new results. So here's a PCA plot:
This used the same Principal Component Analysis as the one here using the 96 Indian Cline samples, Utahn Whites and Onge. However, I projected three extra "populations" on this plot.
These three populations are simulated genetic data of 25 individuals using the allele frequencies from Reference 3 Admixture results.
- Onge11 is generated from the Onge (C2) component from K=11 admixture for Reference 3.
- SA11 is generated from the South Asian (C1) component from the same K=11 admixture.
- SA12 is generated from the South Asian (C1) component from the K=12 admixture.
As you can see, the SA12 population lies between 100% ASI and the Indian Cline samples.
The Onge11 generated samples are a bit beyond 100% ASI on the first principal component, but they are also shifted towards the real Onge on pc2.
At K = 12 it looks like the isolated Onge eats into South Asian variation. As a result it could be that SA12 is slightly more shifted orthogonally towards Onge than it really is. Perhaps you can try a similar experiment for the SA component in the newest reference to see if you get similar results.