Genetic Evidence for Recent Population Mixture in India

Finally the paper I had been waiting for ever since the conference presentations on ANI-ASI admixture dating by Moorjani et al at Reich Lab is out:

Moorjani et al., Genetic Evidence for Recent Population Mixture in India, The American Journal of Human Genetics (2013), http://dx.doi.org/10.1016/j.ajhg.2013.07.006

Here's the abstract:

Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.

In this paper, Moorjani et al calculate ANI (Ancestral North Indian) percentage as:

From Reich et al, they changed the outgroup from Papuan to Yoruba and the ANI clade group from CEU (Utahn Whites) to Georgians. I think both are much better choices. Looking at the D-statistics in Table S2, Georgians are definitely an appropriate choice for forming a clade with ANI.

rolloff-dravidian-indoeuropean

Another important result from the paper is the difference in the date of admixture for Dravidians (108 generations or 3,132 years) and Indo-Europeans (72 generations = 2,088 years).

Testing for multiple waves of admixture, they find that it is more likely in upper-caste and middle-caste Indo-Europeans and the admixture history of a lot of Indian groups is more complex.
ANI-ASI-single-admixture

UPDATE: Razib and Dienekes comment.

130 Comments.

  1. Thank you Zack, this was very informative. My question is, did this separation start around the time period when the Caste system was beginning to evolve?

    • The caste system and its endogamy presumably started after the ANI-ASI admixture.

    • The Caste system evolved later, and was then carried east -- a variation of it appears in the Hebrew/Israelite tribal structure ... which Josephus tied back to India in his writings, about 1900 years ago.

  2. Very interesting. Are there any new samples that you plan to utilize?

    • Moorjani et al have a fair number of new samples, some from interesting populations. The paper just came out so I don't know if they will be sharing their data with other researchers.

  3. Zack,

    Can you add the other 10 Kashmiri Pandit samples? I am curious to see their variation. Also, do you know the background of the Chamar samples? I'm curious if they are Punjabi or not? Finally, how accurate is the new ANI-ASI formula compared to your linear regression and Reich's estimates? Will you be able to use it to do new estimates?

    Regards,

    Paul

    • I expect the new Kashmiri Pandit samples to be similar to the earlier ones based on the reported results. But we'll see.

      The Chamar from Metspalu et al are from UP I believe.

      I believe Moorjani et al have a better result for ANI than Reich et al due to more appropriate non-Indian populations. My regression formula was just a rough effort based on Reich et al results and should not be taken too seriously.

  4. Okay, thanks. No problem if you don't feel like adding them. I am curious about Moorjani's ANI estimates because they seemed to have all decreased in ANI by 6-8% compared to Reich's estimates for various populations and some have decreased more than others. Your estimates were almost on par with Reich's overall. Why would Moorjani using Yoruba and Georgians instead of Reich using Utahn Whites (mostly of Northern Euro ancestry) give more accurate results? It's all very interesting because the differences in estimates is quite significant. If you could choose outgroups, what populations would you choose?

    • My estimates were not independent. They were based on a simple linear regression on my Admixture computation and Reich et al's ANI estimates.

  5. Sorry, I forgot to mention that Reich also used Papuan instead of Yoruba. When looking for outgroups to represent ANI and ASI, wouldn't it be better to use populations that are closer to the original ANI and ASI "source" populations? So, Georgian or some other Caucasus population makes sense for ANI but isn't Yoruba terrible for ASI? Something like the Papuan or Australoid/Negrito populations work much better in my opinion. Of course, the closest you could probably get is the Andamanese but I suppose they have South Asian admixture right?

    Also, my estimates are wrong, ANI estimates have decreased by 3-5%. I read the spreadsheet in his study wrong. My apologies.

    • Onge is what they use for ASI in both Reich et al and Moorjani et al. Papuan/Yoruba was used as an outgroup. Yoruba is definitely a better choice because Papuans have Denisovan admixture.

      Moorjani et al used Georgian for close to ANI which is a much better choice than Utahn Whites or Adeygei in Reich et al. Utahn Whites are not as close to South Asians' Western Eurasian ancestry and they also have a little Amerindian-like admixture. Adeygei are from the Caucasus, so they are a better pick than Utahns but they have some East Eurasian admixture that would be problematic.

      The D-statistics in the supplement show why Georgians are the best choice.

      • Okay. I get what you're saying but overall isn't the West Eurasian ancestry in South Asians closer to a population like the Baloch if they lacked the South Indian component? Is there anyway to manipulate the data to create reference population that is like the Baloch or Brahui without the South Indian component and use them as the ANI? I understand the idea behind the D-statistic and how the Baloch and Caucasian component are closely related but I'm still curious if a better population could be used.

        • The whole point is to use a population as ANI proxy which does not have any ASI like admixture.

          • Zack,

            I understand that. I was essentially asking if you could manipulate the data to create a Baloch population that lacked any ASI ancestry that is found in the South Induan component?

          • Not an easy process. Some work to reconstruct Native American genomes has been going on but still a long way to go.

  6. It looks like Moorjani did two ANI estimates and compared his results to Reich. One by representing ANI through Basques and another estimate by representing the ancestry through the Abhkasians. Both populations representing ANI instead of the Utahn Whites have lowered the ANI estimates by 4-9%. There doesn't seem to be a general trend in which population (Basque or Abhkasian) gives a lower ANI estimate. It varies by population. However, all of Moorjani's estimates whether using the Basque or Abhkasian, give much lower estimates overall. Strange. I don't know which estimates to take seriously at all.

    • Moorjani doesn't use Abkhasians or Basques as ANI substitutes. She uses Georgians for that. The Basques and Abkhasians are the outer population on that branch, Pop2 in figure S2 in the supplements.

      • Thanks for clearing that up. However, how come the ANI estimates are different depending whether the Basques or Abkhasians are the outer population? Perhaps, I'm reading it wrong. Could you post an update with Moorjani and Reich's estimates?

        • The ANI estimates are in Table S4 in Supplementary Info which is open to everyone.

          For most groups, the difference between the ANI percentage using Basques vs Abhkasians is small and within the error margins. I think the largest difference is for Pathans.

    • Also you should take both Reich et al and Moorjani et al seriously. They are very important papers on South Asian population genetics. And you shouldn't think of these ANI estimates as very accurate and precise. You'll notice that they provide error margins.

      • Fair enough but I'm surprised at how different Reich and Moorjani's estimates are. The Sindhis are at 64.3 % vs 73.7%, Kashmiri Pandits are at 65.2% vs. 70.6% and Pashtuns are at 70.4% vs 76.9%. These discrepancies are too large for margin of error.

        http://genetics.med.harvard.edu/reich/Reich_Lab/Welcome_files/2009_Nature_Reich_India.pdf

        • the reasons for the discrepancy are outlined in the paper, as zack has noted having to do with the outgroups. there is no difference between reich and moorjani since it is out of the same lab. if the same lab gives updated statistics you can be rest assured that they are more confident in the second than the first.

          • What Razib said.

            Also, Paul, you are comparing the f3 ancestry estimation numbers from Reich et al. If you look in the paper's supplementary information, you'll see Reich et al also reported an f4 ancestry estimate, which is what they use in Moorjani et al with different populations. Thus you can directly compare the effect of changing the populations.

          • Yes, I understand that it must be down to the difference in using Utahn Whites and Georgians. I was just surprised at how significant some of the estimates are. For Sindhis, at nearly 10%! I have a question though Razib. Does using Georgians as a proxy for ANI accurately account for all of the West Asian like admixture components on Harappa besides the Baloch and Caucasian components. For example, the NE Euro, Mediterranean and SW Asian components?

          • @Zack

            I can't find the f4 ancestry estimates. Where are they located in the paper?

          • Paul, check out the supplemental data. Sindhi is 70.7 and the Pandits are 69.3

        • Hi Paul,

          I think Zack is making a great point concerning precision and (absolute) accuracy. If you look at the PCA plot, the Sindhi are west of the Kashmiri Pandits, and are slightly more shifted towards Europeans and West Asians on the North-South Eurasian axis (but very slightly). So one would expect them to have a higher "ANI" percentage. But not the case here. I think the percentages are rather fuzzy and tentative, but that isn't unusual. For what it's worth, they did exclude quite a few Sindhi and Pashtun samples (but I think these are the same individuals excluded in the earlier paper, and African ancestry in the Sindhi is presumably the main reason for this. I don't know this for sure, but I think the Pashtun samples with probable recent ancestry from West or Central Asia were excluded. Out of 23 samples, they used 15).

          • @HRP0282

            I agree. However, the fact that you're pointing out about the Sindhis is definitely confusing me. They should be slightly more West Eurasian overall than the Kashmiri Pandit samples based on their clustering on PCA plots. Although, I believe that Moorjani used all 15 Kashmiri Pandit samples this time rather than the 5 Reich used during his estimates. If I'm wrong, please correct me.
            So, you might be on to something with your analysis that the percentages are rather "fuzzy."

            @Zack

            In fact, I'm skeptical if Moorjani's ANI estimates account for all the West Eurasian ancestry in South Asians. In others, all of the West Eurasian admixture components on the Harappa Ancestry Project (i.e. Baloch, Caucasian, NE Euro, Mediterranean and SW Asian). Not to mention that the South Indian component on Harappa Ancestry Project is a mix between majority (ASI) and minority (ANI) and that's why it is slightly shifted toward West Eurasians rather than East Eurasians even though the ASI "population" itself is slightly closer to East Eurasians.

            For example, if you add the West Eurasian like admixture components on HAP (Baloch + Caucasian, NE Euro + Mediterranean + SW Asian) for the Kashmiri Pandits, Sindhis and Pashtuns, you get 61%, 67% and 72% respectively. Now, if you were to separate the "supposed" minor ANI from the South Indian component, those percentages would increase in various amounts for each population.

            Although, many people have attempted to get rough estimates of the ANI% by calculating the ASI% instead using your linear regression formula (based off Reich's work) and subtracting the ASI% from 100% assuming the population has no real East Eurasian admixture and is only made up of the ANI and ASI components. For the populations with minor but real amounts of East Eurasian components (NE Asian, SE Asian, Siberian, Beringian, American, etc.), you would have to factor that in separately.

            In regards to the Sindhis and Pashtuns excluded, I've been told by Parasar that they were excluded because of West African admixture which sounds plausible for Sindhis (Zack has 4 samples ranging from 2-10% West African) but quite strange for the Pashtuns. Anyways, Zack has 23 of the HGDP Pashtuns in his project. None seem to have any recent West Asian or Central Asian ancestry other than HGDP00214 (about 8% East Eurasian) and HGDP00220 (about 30% East Eurasian).

            @Curious

            Where in the supplemental data did you find those numbers? I don't see it in table S4: Ancestry estimates from F4 Ratio Estimation on page 14 in the pdf below.

            http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929713003248.mmc1.pdf

          • @HRP0282

            Sorry, my second to last paragraph is directed toward you and not Zack. My apologies.

          • @Paul

            The paper is suspect for several reasons. My main point of contention is about their biased range of sampling.

            Notice how Moorjani et al have made a more thorough investigation of the regional caste diversity of UP and the Dravidian States (but have ignored Brahmin, vaishya and kshatriya ranges for the NW/W states). They make up for their shortsightedness and lack of professionalism by including Jain samples to represent Gujarat, when Jains are a fringe minority religious caste who make up less than 1% of the total demographics of that state. Talk about selective biases.

            Another critique is that only UP Muslim samples are being included in this paper, and Moorjani et al makes no attempt to distinguish them, which surely underestimates the extent of UP Muslims who have recent foreign ancestry. Muslims from other states are largely being ignored time and again for some strange unknown reason.

            As usual, Reich et al are more concerned with the politics of Brahmins and Hindu upper-castes in keeping with their religious agenda, than with understanding and resolving the mystery of groups such as Muslims and Jats who actually have a more recent history with West Asia. That in itself should tell you something.

            Either this has to do with them not being able to get a more through sampling of the NW region, or them (due to their religious biases) purposefully leaving out NW/W caste populations in order to manipulate the data in their paper so that it seems Brahmins are genetically closer to Sindhis and West Asians than they really are.

          • @orange,

            No doubt, more samples and populations would help, but you have to admit that populations on the western periphery - Makrani, Pathan, Balochi, Brahui, Kalash, Hazara, Burusho have been sampled. That in conjunction with Zack's data-set should give us a very good idea.

          • @Parasar

            I would be cautious about the controversial ideas being presented in this paper. Moorjani et al have undersampled NW/W caste populations for a reason that is known only to them. However, that is not a good excuse for their lack of professionalism given that they knew the ripples this "discovery" would cause in most science academia circles.

            The problem is that most people won't be using Zack's data-set populations (which is still hugely lacking in NW/W Brahmin and kshatriya samples from states such as Rajasthan, Gujarat and Maharashtra) to put the two and two together, and will only see what they want to see.

            Why can't normal NW caste populations be sampled in these Projects for once? That is the question.

          • I don't see any agenda or scientific controversy. In fact, Reich Lab has helped improve our understanding of South Asian population genetics quite a bit.

          • @Zack

            There is no doubt that this groundbreaking paper has helped to shape our understanding of the complex demographic processes in India which led to the widespread cultural shift in endogamy.

            However, I personally think that the paper could have been much more comprehensive in including a wider range of NW/W Indian caste populations and Pakistanis. From Moorjani et al's genome-wide data of 73 groups from the Indian subcontinent, the vast majority are Indian, with only 2 Pakistani group samples.

            This obviously means the research is biased towards India and not Pakistan.

            29 Dravidian vs 22 Indo-European India group samples (discounting the Pathan and Sindhi of Pakistan).

            That also means the research is biased towards Indian Dravidian-speakers.

            Of India Indian-Europeans, the UP state was oversampled vs the other North Indian states who were only sampled on tribals.

            My ethnic caste group is not even mentioned in here.

            My question is this: if Moorjani et al have not taken a wider and more thorough range of samples from NW/W caste populations then how can they profess to make bolded statements such as this:

            "It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years"

            Moorjani et al knows there is much scope for trial error in their research and makes a disclaimer in the paper implying work is still in its preliminary years:

            "(although it is possible that with further sampling and new methods such relatedness might be detected)"

            Other geneticists have made the same criticisms-

            "Lynn Jorde, a geneticist at the University of Utah in Salt Lake City, calls the results “intriguing,” but cautions that they need to be confirmed with a larger number of samples from even more regions of the Indian subcontinent, as well as with the use of complete DNA sequences from the entire genomes of all the individuals studied."

            http://news.sciencemag.org/2013/08/india%E2%80%99s-fragmented-society-was-once-melting-pot

          • They did consider other Pakistani populations, Baloch, Makrani, Brahui, Burusho and Kalash but had to exclude them because they are not simple ANI-ASI admixture.

            I would have liked them to consider Punjabis, but currently only I have a decent number of Punjabi samples, though that will change soon when the 1000genomes South Asian data becomes available.

  7. It is quite interesting that Moorjani finds Kashmiri Pandits to have slightly more ANI than Sindhis (65.2 vs 64.3). The Reich paper, had it the other way round, with Sindhis being more ANI than the Pandits (74% vs 71% f3 ancestry estimation). Furthermore, the UP Brahmins are very close to Sindhis in terms of ANI (63 vs 64).

    This is what Razib had to say about the Baloch component in his recent post:

    "It seems plausible to me that this widespread Baloch fraction is reflective of the initial ANI-ASI admixture event."

    • @Curious

      If that's true, what does it suggest about the fact that the fst distance between the Baloch and Caucasian components being very small and the fact that many West Asian populations have 5-30% of the Baloch component. In particular, Iranians have 27% and the Lezgin have 28%. Many Caucasus populations have it at around 20% as well.

      Taking a look back at the ANI estimates by Moorjani, I'm not sure if any West Eurasian population truly makes a good outgroup to represent ANI. Assuming the majority of West Eurasian ancestry in South Asians is represented by the Baloch component and to a lesser extent the NE Euro and Caucasian components, the Georgians (Abhkasians as used by Moorjani) and Basques are still too different from the Baloch and lesser amounts of the NE Euro and Caucasian components found in South Asians. If the Baloch lacked the South Indian component, I think they would be a great representative.

      • I think the Baloch in West Eurasian populations represents migration from the Indian subcontinent.
        Anyways, if you add up all the harappa West Asian like components along with the Baloch in Sindhis, you will notice that it adds up to >65 percent. It could just be differences in sampling, who knows, but my belief is the Baloch in Sindhis is fused with ASI.
        Also notice that the Baloch component is the farthest from Caucasian compared to other West Asian like components. NE European and Mediterranean are closer. Why do you think that is? Note, I am not counting North African as a different component.
        Let's keep an open mind. I don't see why Indians could not have migrated to West Asia in ancient times.
        Also, a paper I read recently about R1a mentions that at one point in the distant past, R1a folks did migrate from India to Central Asia.

        • Perhaps but individuals were suggesting the Baloch component came to South Asia from Neolithic farmers from around eastern Anatolia. Now it may be plausible that the Baloch component represents South Aslan ancestry in West Asian groups?

          Also, yes, the Baloch component is southeastern shifted but still fairly close overall. In terms of adding up Harappa components, the South Indian component is supposedly West Asian shifted and has a smaller fst distance to West Eurasian components than East Euasian ones. If the component wasn't admixed, the South Indian component would be closer to East Eurasian components because ASI is marginally closer to East Eurasians rather than West Eurasians.

        • R1a-L657 is thought to have originated in South Asia and then moved into the Middle East. Y7+ has been found among Arabs and South Asians alike. Z2123 is another one that's found in South Asia, the Middle East and Europe. These migrations could have contributed to South Asian admixture found in West Asia.

    • The difference between Pandits and Sindhis is within the error margin. I need to look in detail to see what changed here, but I wouldn't really focus too much on minor changes like that.

      • True but does using Georgians instead of Utahn whites explain the 6-7% difference in Moorjani and Reich's estimates?

        Also, will you be able to develop a new regression formula based on Moorjani's calculations/estimates?

      • I agree Zack. The important thing here is that South Asia had two divergent populations that intermingled 1900 to 4200 years ago.
        Although I wonder if the admixture started occurring 300 years earlier i. e. 2500 BC at the height of the Harappa civilization. Many South Indian populations have the NE European and the Caucasian component, which might be skewing the date a bit. What I am saying is that ASI came first, then the Georgian like folks, followed by the indo aryans.
        Just a theory.

        • The dates are based on a single, instantaneous admixture event. That is of course an idealized situation. Also they use 29 years as a generation. Changes to any of these assumptions can result in a few centuries change.

          I do think that there were at least two different ANI-like waves. The second one mainly affected the northwest and the upper castes.

          • I subscribe to this thinking too, but likely only as far as the number "two" is concerned! The first ANI wave is about 1900BC when the Indus Valley is devastated and the ANI move west, east and south. The second ANI wave is much more recent ~700AD - after the Arab invasion of Makran, Sindh, Multan, Malwa, Gurjara.

            I think the 29 years per generation is a little higher than the norm of 25, especially for an ancient period.

  8. Hi Zack,

    I just looked at the PCA plot Dienekes posted. It seems they have Vedda samples! Genetically speaking, probably not a unique or distinctive population (but I could be wrong). Nevertheless, it would be really cool to have them in your data-set. I really hope you will be able to acquire some additional samples from this paper (just for the fun of it)!

    I do have a few questions concerning the paper. I would be really grateful and appreciative of any insight you can provide. Did they consider the substantial East Eurasian-American admixture that certain West Eurasian populations (including all Europeans) have? Or does most of the work on this paper predate that finding? Would this not confound and complicate certain aspects of admixture estimation and D-statistics? I don't have even a small shred of technical knowledge when it comes to these methods, so I am very interested in your thoughts. Also, I was wondering if populations from the Trans-Caucasus region, like the Georgians, have this East Eurasian-American admixture. I think they don't, but my "thinking" is based on pure speculation and assumption. Since you've probably experimented with the data, I really want to hear your thoughts on this specific question. Also, if you've never explored this question (especially in relation to South Asia), do you have any plans to work with MixMapper, TreeMix, or just plain D-statistics?

    • The Eastern Eurasian/Amerindian like admixture in some European populations is one of the reasons I am glad Moorjani et al switched from Utahn Whites (CEU) to Georgians as an ANI proxy. Several West Asian populations have a different (Turkic/Mongol/Siberian) East Eurasian admixture, e.g. Turkish, Adeygei, etc. I have not seen any such issue among the Georgian samples.

      I do plan to do a few things other than the simple ADMIXTURE runs I have been doing. Let's hope I find the time to do so soon.

  9. Do we know why the lower limit was raised from the 1200 of their initial proposal to 1900? Additional 13 groups - that should broaden the range not reduce? Or perhaps they removed the non-cline populations such as the Santhals that were lowering the age?

    It ranges from ~4200 for Andhra Pradesh Vysya to ~1900 for the Uttar Pradesh Dharkar, Brahmin, Sindhi, and Pathan. If ANI-ASI admixture first occurred in the south it seems to me quite plausible that until a very recent period there was either no ASI in a places like Uttar Pradesh, Sindh, the Punjab, Gandhara etc, or there was ASI present, but no admixture.

    They also can't date entry, and just limit themselves to the time of admixture:
    "It is also important to emphasize what our study has not shown. Although we have documented evidence for mixture in India between about 1,900 and 4,200 years BP, this does not imply migration from West Eurasia into India during this time. On the contrary, a recent study that searched for West Eurasian groups most closely related to the ANI ancestors of Indians failed to find any evidence for shared ancestry between the ANI and groups in West Eurasia within the past 12,500 years3 (although it is possible that with further sampling and new methods such relatedness might be detected). An alternative possibility that is also consistent with our data is that the ANI and ASI were both living in or near South Asia for a substantial period prior to their mixture."

    Reich proposes this scenario based on the paper: "Only a few thousand years ago, the Indian population structure was vastly different from today," said co-senior author David Reich, professor of genetics at Harvard Medical School. "The caste system has been around for a long time, but not forever."

    Another take is that "India transformed from a country where mixture between different populations was rampant to one where endogamy -- that is, marrying within the local community and a key attribute of the caste system -- became the norm."

    To me, it appears then that there were two major transformations, first a period of absolutely no admixture except some intrusions form the north to the south, then a period of a admixture in the north - perhaps coinciding with a rise of Buddhism in the eastern Ganga basin when even the lowest and upper most castes get admixed, and then followed by a revival of caste distinctions in the post-Buddhist period.

    • I am speculating here, so take it with a heaping of salt.

      The ANI were in the northwest or came there and mixed with the ASI. That should generally give us an older date for admixture in the northwest. But multiple admixtures in that region are probably obscuring this.

    • I agree with your reasonings....

  10. can somebody post the ANI/ASI chart you guys are talking about? I can't find it

  11. I wish to reduce the intelligence of this conversation. How is the admixture date inferred in this case?
    ANI + ASi. => admixture A
    ANI + admixture a=> admixture B
    ANI + admixture b=> admixture C
    Each of the admixture events happen at a later date. In this case, what age is assigned to admixture?

  12. Hi another question. Where are you seeing 108 generations, I see 144 generations for Dravidians, in the paper.

  13. Zack and Razib, I need you opinion on this. Initially the paper mentions the pre-neolithic status of mtDNA haplogroups U2 and U7 in South asia and Iran. Ancient DNA testing has confirmed that U was a feature of the West eurasian paleolithic. Mesolithic europeans have been tested for U4 and U5 and U2. The Kostenki specimen of Russia dates to about 30kya and belongs to U2. Similarly a 30kya back migration to North africa has been suggested for U6. Later in the Neolithic U types were replaced by HV and TJ types in west Eurasia spanning Europe, the Middle East, North Africa and parts of South Asia. Neolithic wave autosomally is best represented by the Caucus, South-west-asian and Baluchi components. So getting back to austosomal components; doesn’t the presence of U2 and U7 besides M in India suggest that the South Asian component at the least is a merger of two ancient components? One from Paleolithic West Eurasia and the other from around the Bay of Bengal? Metaphorically the birth child of the Austo-melanesian looking Kostenki and an Andaman islander, LOL.

    • how much of ASI is related to Andaman Islanders?

    • Opps I meant to say the paleolithic U types were replace...

    • U2 phylogeny is well developed in South Asia. Many of the U2 strings are just not seen anywhere else. We can simply look at the control region mutations and figure out the age applying a 0.0043 mutations per generation (pedigree based). Per Palanichamy "unless further screening of the Near Eastern mtDNA pool would exhibit early offshoots of U2a, U2b, or U2c, an entry of U2 in India more recent than 40 kya is not plausible."

      Till now the clocks for Y-dna was mainly STR based, and folk (Dinekes and his 1/3 correction comes to mind) were just fitting the clocks to their own pet theories. Nevertheless, with full sequence Y scans we now have tools available for SNP based dating. Which means we should be able to more precisely date mutations such as R1a1-L657 which are mainly limited to South Asia and its periphery.

      Two recent papers used the SNP approach and both pushed back the ages of Y-SNPs, one significantly.
      Poznik et al.: "Our findings suggest that, contrary to previous claims, male lineages do not coalesce significantly more recently than female lineages."
      Francalacci et al: "We calculate a putative age for coalescence of ~180,000 to 200,000 years ago, which is consistent with previous mitochondrial DNA–based estimates."

      While these papers did not have L657 to test, full Y scan for this line is pending and we should soon have a good idea of the age of R1a1-L657 in South Asia.

      And if we can date the Y and mtDNA lines precisely, we will have a reasonable age for the autosomal components that correlate.

  14. Indo-Aryan Assimilation Theory (AAT) | Brown Pundits - pingback on August 9, 2013 at 5:31 am
  15. Okay Everybody the most important question-WHAT THESE RESULTS SAY ABOUT THE INDO-EUROPEAN QUESTION?
    (A scientific answer is wanted).
    Good day.

    • Did you mean the Indo-European Language? Because if Rajesh Rao's hypothesis is correct and that the Harappan/Indus Valley Civilization spoke a form of the Dravidian language, then it would seem that their decline and the intro of the Indo-European language in the area occurred around the same time.

  16. @Saki
    Just go in this blog and become human once again-
    http://new-indology.blogspot.in/
    Good day.

  17. I think the pre-neolithic element in Indians called ASI may in fact be a hybrid element. Both tribes and castes have mtDNA U2/U7 ranging between 10 and 20% alongside more indigenous elements. Also y chromosomes F3 in West Eurasia has been recently linked to H. So it's quite possible that H and U entered India quite a long time ago, say 30ka and fused with the Eastern elements that were already there.

    Do we see this any where else? Yes in Northern Europe Sammi people among others are a fusion of old European lineages and old Asian ones. ie U5 + Z and I1 + N1c

    Other evidence is that the South Asian component is relatively equidistant between Asian and West Eurasian components. Also when Zack achieved a modal Onge component at K = 11, that element was only a fraction of total ASI. I feel like the Onge component invoked the raw Asian element in ASI ignoring the western element.

    • Ibra, which South Asian component are you referring to? The South Indian component in Harappa Ancestry Project's admixture runs? Or the South Asian component in Dodecad and other admixture calculators?

  18. Interesting data. It would be useful to know the criteria used for the 'Post PCA curation'. The notes for the Pakistani populations (Pashtuns and SIndhis) state '(1) Remove samples and groups that have evidence of recent ancestry from groups other than ANI and ASI based on PCA'. This caused 60%of the Sindhi samples and 40% of the Pashtun samples to be removed. We know that a few of the HGDP Pashtuns exhibit elevated East Asian admixture while some of the Sindhi samples have African admixture. These do appear to be a small number though (<5) and the implication that 60% of Sindhis and 40% of Pashtuns have recent 'non-native' admixture doesn't quite seem right.

    • For Sindhis, it's because of African and/or Baloch admixture. Either makes them not be on the line between ANI and ASI.

      • That makes sense. Although it does beg the question which set of samples represents Sindhis, if the majority have Baloch and/or African admixture...

        Another interesting tidbit is the assertion that the ASI/ANI admixture date for Brahmins apears to be more recent than that of Pashtuns. I'm not sure how to interpret that.

  19. @Zack

    I have read through most of Moorjani’s paper and looked over Reich’s original paper again and have some interesting points I think should be noted. I was hoping you or Razib could possibly address them.

    In Reich’s original paper from 2009, he mentioned that estimating the proportions of ANI and ASI ancestry in India is challenging since Reich lab are unaware of any published methods that produce unbiased estimates of mixture proportions in the absence of accurate ancestral groups. Originally, Reich and his team used the Utah Whites (CEU) as a proxy for ANI and Moorjani found that Georgians serve as a better proxy due to their z-score and closer resemblance to the “original” ANI population/s. Therefore, this would suggest indicate that the ANI or ASI (although they didn’t do these) estimates would be somewhat more accurate than Reich’s estimates after factoring in the change in outgroups as well.

    However, in estimating ANI admixture, Moorjani only used f4 Ratio Estimation. Why is that? In his original paper, Reich mentioned that “We developed three methods for estimating ancestry, which we verified were accurate even in the face of SNP ascertainment bias and some inaccuracies in our phylogenetic model, and which we found provided consistent estimates. The 18 Indian Cline groups all have between 39% and 77% ANI ancestry based on f3 Ancestry Estimates (Methods) which we quote because it has the smallest standard errors (Table 2).”

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2842210/

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2842210/table/T2/

    Because of this, I am skeptical of Moorjani’s methods and intentions with only using f4 Ratio Estimation if it has a higher standard error than f3 Ratio Estimation. Would it have not been better to repeat the f3 Ratio Estimation from Reich’s 2009 study or at the very least include them as well? I’m not sure why but f4 Ratio Estimation also seems to estimate lower ANI ancestry for the vast majority of the samples in comparison to f3 Ratio Estimation and Regression Ancestry Estimation. Could you explain why that is?

    I would also like to point out that Moorjani’s f4 Ratio Estimations (Table S4 on p.14 of the pdf below) using Basques or Abhkasians as outgroups along Yoruba (YRI) have higher standard errors than Reich’s f4 Ratio Estimations using the Adygei and Papuan as outgroups with the CEU as proxy for ANI. Perhaps, it’s not too significant but I do think it was worth noting.

    http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929713003248.mmc1.pdf

    Furthering on that question, I am not sure why Moorjani did not use all three methods (as seen on Table S5 in the link below) that Reich’s original team developed? Especially, considering the research was done in the same lab? Would that have not provided a more accurate range of estimates rather than just relying on f4 Ratio Estimation?

    http://genetics.med.harvard.edu/reich/Reich_Lab/Publications_files/2009_Nature_Reich_India_Supplementary.pdf

    In addition, I am curious about the standard error for Regression Ancestry Estimation? Reich’s paper didn’t seem to address it but is fair to say it is much higher or lower than f3 or f4 Ratio Estimation? Perhaps, I am being presumptuous in this but since Reich quoted f3 Ratio Estimation, I would assume it would be slightly higher. Interestingly enough, Regression Ancestry Estimation also seems to provide the highest ANI estimates for all groups and seems to correlate to an extent with the combination of West Eurasian like admixture components (Baloch, Caucasian, NE Euro, Mediterranean and SW Asian) on Harappa Ancestry Project. Especially, if you factor in that the South Indian admixture component itself seems to be partially West Eurasian (ANI) admixed itself. Is there a logical reason or explanation for why that may be? Finally, why do you think Moorjani’s group did not consider using Regression Ancestry Estimation as well?

    I suppose what I am really getting at is why didn’t Moorjani’s group (despite being from the same lab) also publish f3 Ratio Estimation and Regression Ancestry Estimation results like the original paper did?

    • It seems that f4 ratio's standard error depends on using appropriate outgroups. The f4 estimation in all likelihood is more accurate in Moorjani's paper because they used more appropriate outgroups.
      Therefore, the standard error is probably lower.

      • Hi Paul,

        It is not wise to assign ulterior motives to the authors of the Moorjani et al. paper. Anyway, the percentages have not changed that radically. And it is very hard to specify a good outgroup, since most humans are admixed. Living human populations do constitute an interwoven lattice. For example, the Basque are used as an outgroup in this paper, but they are not a "pure" "West Eurasian" population (there don't seem to be any around today). They seem to be around 20% non-West Eurasian (Using MixMapper. But f4 ratio estimation also yields approximately 20-25%, so one can be confident of this result). Also, the San people seem to be a good outgroup to all other living humans, better than the Yoruba. But they also have West Eurasian admixture (up to 15% in some groups). So these things are very tricky. It will be very interesting to see how things look using something like MixMapper (the fact that both f4 estimation and MixMapper put the Basque and Sardinian populations within the 20%-25% East Eurasian range indicates that the results will be comparable). MixMapper was developed by the Reich lab (as far as I know), and they utilized it on European populations, with interesting results. In that paper ("Efficient moment-based Inference of Admixture Parameters and Sources of Gene Flow"), they do mention the HGDP Pakistani populations.
        "Similarly, we found that while Central Asian populations such as Burusho, Pathan, and Sindhi have clear signals of admixture from the 3-population test, they likely have ancestry from several different sources (including sub-Saharan Africa in some instances), making them difficult to fit with MixMapper."

        • Yes, we should not speculate.
          It could easily be that f4 estimation has always been their preference, as long as standard error is acceptable, and the outgroups are better.
          Let's all keep in mind that Moorjani is a graduate student working under Reich's guidance.

        • @HRP0282

          My apologies. I'm sorry if I came across as speculating for ulterior motives. That was not my intention at all. I was just pointing out that Reich stated he quoted f3 ratio estimation in his original paper because he indicated that it had lower standard error than f4 ratio estimation. He also included regression estimation as well. Overall, his paper seemed more in depth in that regard. Hence, I was just curious as to why f3 ratio estimation and regression estimation were left out?

          Yes, the percentages have not changed that radically but there were non-trivial differences between f3 ratio estimation (0.6-3.0% depending on population) and f4 ratio estimation in Reich's original paper. In addition, regression estimation yielded differences ranging from 3-8% or so in terms of percentages when compared to f4 estimation. That's quite a significant margin for certain groups and in general.

          Anyways, I'm sorry for that long post but I just wanted to find out why f3 ratio estimation and regression estimation were left out? Especially, when it seemed Reich considered r3 ratio estimation more "accurate" and decided to include regression analysis as well. It's just a bit strange considering this was all done in the same lab.

          • It's not strange at all. The Reich et al paper was about ANI-ASI admixture. So they used several methods for computing the ANI percentages.

            Moorjani et al is about refining that and estimating the date of admixture. So they picked the best method they could and tried to make it better and then compute the dates.

            Of the original 3 methods from Reich et al, regression was always the weakest and did not even have any error bounds.

            The f3 ancestry estimation required solving a system of equations. I don't really know why they dropped it, but there can be lots of reasons. It's also possible that it was about equal to the f4 method, and so just picked one of the two.

            As for the error bounds, that's not the only measure of accuracy.

      • Are you sure? The f4 ratio estimation standard errors are listed in the supplementary data. If Moorjani's paper had better outgroups, the f4 ratio estimation would have had lower standard error than Reich's original paper, no? Looking at Table S4 in the supplementary data (http://download.cell.com/AJHG/mmcs/journals/0002-9297/PIIS0002929713003248.mmc1.pdf), you can see that f4 Ratio Estimation based on the Basques or Abhkasian and YRI (Yoruba) as outgroups actually have higher standard error than Reich's f4 estimation based on the Adygei and Papuan as the outgroups. Although, I do agree with the general consensus that Moorjani's group used a better ANI proxy.

        • After looking at the definition of standard error, it seems that the definition depends on standard deviation. I guess using better outgroup causes higher standard deviation. In other words the samples are separating more from each other in terms of ANI percentage.
          I don't think having higher standard error is necessarily a bad thing.

          • Fair point. It may or may not be a "bad" thing depending on one's perspective. The better outgroups may have created higher standard errors but produced more accurate estimates. I still think it would have been nice to have included f3 ratio estimation though since Reich originally quoted it over f4 ratio estimation. It's not a huge deal but more data doesn't hurt.

  20. @Zack

    Thanks for the response. I think it answers my question for the most part. The focus was different for this paper.

    "Moorjani et al is about refining that and estimating the date of admixture. So they picked the best method they could and tried to make it better and then compute the dates."

    Is f4 ratio estimation better than f3 ratio estimation though? Reich seemed to think f3 ratio estimation was a bit more accurate (or rather just less inaccurate) and thus the reason for quoting it.

    "Of the original 3 methods from Reich et al, regression was always the weakest and did not even have any error bounds."

    Interesting. Is regression just mathematically more inaccurate than ratio estimation in general? Or is it more inaccurate because it doesn't indicate error bounds? Sorry for asking that. I'm not exactly a calculus/statistics wizard.

    "The f3 ancestry estimation required solving a system of equations. I don't really know why they dropped it, but there can be lots of reasons. It's also possible that it was about equal to the f4 method, and so just picked one of the two."

    Yes, they were fairly similar for the most part but in certain instances, there were differences of about 2-3%.

    "As for the error bounds, that's not the only measure of accuracy."

    Could you elaborate on that a bit? What are the other measures of accuracy in the case of ratio estimation?

  21. They are pretty much the same thing. The main component that is correlated with aboriginal ancestry in South Asian people.

    • Not at all. The South Indian component on Harappa Ancestry Project's admixture runs is majority ASI with minor ANI which pushes it slightly toward West Eurasians rather than East Eurasians. The South Asian on Dodecad and other calculators is heavily ANI influenced but still slightly majority ASI.

      • It depends on the level of the K. Also admixture works best if there are pure populations. The closest thing we have is the paniya at 83% ASI. If you had a pure ASI population to add to admixture then South_Asian = ASI otherwise ASI is a linear transformation of South_Asian component for on-cline populations.

        My conjecture about ASI is that this element itself was the merger of two other elements. One of people in West Eurasia before the neolithic expansion represented by U and another of more Eastern indigenous elements represented by M2 R5 etc..

  22. My thoughts:

    1) ANI shows a divergence of several thousand years from the Caucasian profile. It is therefore unlikely to have been close to it at the time of the first admixture.

    2) Geographical isolation, at the time of the first admixture, between ANI and ASI is quite probable. The Indo-Gangetic plain was heavily forested to the east, and the pre-Thar desert would have formed a southern boundary to the IVC-Mehrgarh-BMAC zone.

    3) The Dravidian admixture occurs at the time that the IVC is beginning to transition into the Cemetary H culture. Settlements begin to move eastward and southward into the rest of India, as environmental change forces the Indus population to find new places to live.

    4) The second admixture occurs at a time when the caste system was at its weakest. Between 300 BC & 300 AD Buddhism was dominant and India was subject to a number of invasions by groups from, or passing through, the Gandhara-Bactria area.

    It makes sense, then, that ANI was the genetic profile of the IVC-Mehrgarh-BMAC populations. This would also be why the Harappa project seems to find a 'Baloch' correlation to ANI.

    The first admixture occurred when the IVC collapsed, sending a fairly large population east and south. In addition, newly the newly 'Indo-Europeanised' BMAC, Swat & Cemetery H peoples are moving along the same track into the rest of India.

    Admixture ceases as castes begin to form and endogamy take hold.

    However, when Buddhism was on the rise, a number of groups from the Bactria-Gandhara region invade India. Being Buddhist themselves, they intermarry creating the second admixture point between 200 BC & 200 AD.

    When Hinduism gains dominance, again, and the Guptas begin their rise, caste endogamy becomes more important, freezing admixture in the north.

    What do others think of this?

    • That's an interesting theory, I guess the next thing is to map Y and MT haplogroups to admixtures and migration patterns.

      • I agree. I suspect that mt M3 might give us some clue to this, as well as the time of divergence between upper and lower caste ydna r1a & h.

        • It would definitely be interesting to map admixture with haplogroups, so far R1a, which is my direct clade, is showing as entering the subcontinent 4000 ybp. .

          Makes me wonder if they contributed to the demise of the Harappan Civilization or if R1a was present prior to this.

          So many questions and so little answers. But Zack has been doing a great job with his research, of which I am so grateful for.

          • ''It would definitely be interesting to map admixture with haplogroups, so far R1a, which is my direct clade, is showing as entering the subcontinent 4000 ybp. .''
            Sorry but Farmana aDNA will discard that:(
            BTW besides that what is the basis of your conclusion?
            good times....

  23. ==AN ANALYSIS OF aniasi THOUGHTS==
    ''1) ANI shows a divergence of several thousand years from the Caucasian profile. It is therefore unlikely to have been close to it at the time of the first admixture.''
    ANI shows similarity to the Caucasian profile and that relation can be very very archaic also Balochistan component is present in moderate proportions in that area which shows SC Asian influence....
    ''2) Geographical isolation, at the time of the first admixture, between ANI and ASI is quite probable. The Indo-Gangetic plain was heavily forested to the east, and the pre-Thar desert would have formed a southern boundary to the IVC-Mehrgarh-BMAC zone.''
    The Question is was SSC(IVC+Mundigak)+Mehrgarh+BMAC ANI dominant? my bet is yes and about 90% ANI at ~2000BC.
    ''3) The Dravidian admixture occurs at the time that the IVC is beginning to transition into the Cemetary H culture. Settlements begin to move eastward and southward into the rest of India, as environmental change forces the Indus population to find new places to live.''
    Yes but SSC was not Dravidian......
    ''4) The second admixture occurs at a time when the caste system was at its weakest. Between 300 BC & 300 AD Buddhism was dominant and India was subject to a number of invasions by groups from, or passing through, the Gandhara-Bactria area.''
    After Ashoka yes but before i don't think there were much successful invasions by any means.....
    ''It makes sense, then, that ANI was the genetic profile of the IVC-Mehrgarh-BMAC populations. This would also be why the Harappa project seems to find a 'Baloch' correlation to ANI.''
    I agree, Baloch is likely Paleolithic/indigenous and appears to give admixtures outside...
    ''The first admixture occurred when the IVC collapsed, sending a fairly large population east and south. In addition, newly the newly 'Indo-Europeanised' BMAC, Swat & Cemetery H peoples are moving along the same track into the rest of India.''
    What do you mean by''newly the newly 'Indo-Europeanised BMAC, Swat & Cemetery H people''??
    ''Admixture ceases as castes begin to form and endogamy take hold.
    However, when Buddhism was on the rise, a number of groups from the Bactria-Gandhara region invade India. Being Buddhist themselves, they intermarry creating the second admixture point between 200 BC & 200 AD.
    When Hinduism gains dominance, again, and the Guptas begin their rise, caste endogamy becomes more important, freezing admixture in the north.
    What do others think of this?''
    How old do you think is caste?
    Good day.

  24. Many Indian populations (Austroasiatic spakers) also seem to be on a cline between South East Asian (SEA) and ASI. Has anyone ever attempted to find out the date of this admixtue? Or better question can someone do it. 🙂

  25. Out of curiosity does rolloff make any assumptions about the mutation rate? Why were they running coalescent simulations with the mutation rate 2*10^-8 bp/gen? Also doesn't rolloff have a 2X time dependency with StepPCO and HAPMIX?

  26. What is the ANI percentages in other populations like Kalash, Burusho, Tajiks Baloch etcc??

    • Because they don't fall on the Indian cline, it's not simple to calculate their ANI/ASI percentages.

      • how do you figure out who falls on indian cline? because on the spreadsheet, kalash, burosho have similar amount of south indian compared to the 23 pashtun samples

        • Reich et al and Moorjani et al do it by doing a PCA.

          I did a rough estimate for all South Asian populations based on the Reich et al data, which is what you are referring to. There I am interpolating and extrapolating from Reich et al results.

  27. Hello Zack,

    Oh, it didn't even occur to me that PCA plotting is how they modeled the Indian cline. I really need to improve my reading skills. Isn't this somewhat problematic in the context of local population differentiation? I thought PCA is really good for African vs West Eurasian vs East Eurasian, but don't things get more complicated at a local level? The Reich lab once wrote in one of their papers that geographically close populations (which are also very closely related in terms of biogeographic ancestry) can have rather different positions on PCA plots, due to a lack of shared genetic drift on their common ancestral lineage (this was written in one of their "Reconstruction..." papers, but I can't recall exactly which one, probably the Native American paper)? Isn't this observation exceedingly pertinent in the case of the Kalash and Burusho, and especially the former, due to inbreeding and drift? The Burusho seem, in most Admixture/Structure runs, to be carbon copies of the HGDP Pashtun samples, with the only difference being that the Burusho are substantially East Eurasian shifted. Whenever the Kalash-modal component is avoided, the Kalash appear to be identical to the HGDP Pashtuns, and they seem to cluster right next to the HGDP Pashtuns in neighbor-joining trees. Basically, what I'm saying is that although the Burusho are off-cline due to East Eurasian admixture, the Kalash really seem to be inbreed, highly-drifted copies of the HGDP Pashtuns. Based on this alone, I'd assume most populations in Northern-Northwestern Pakistan are very similar. So, my question really is (I know I've asked too many already, but you can ignore all of them, the main question I want to ask is this), if we ignore the PCA plots, doesn't most of the evidence suggest that the Kalash are as South Eurasian as the HGDP Pashtuns, but just significantly more homogenous in terms of the consistency of South Eurasian admixture (HGDP Pashtuns are rather heterogeneous in this regard, as HGDP00214 seems to overlap with Tajiks, and 2 individuals are rather South Eurasian shifted for even Punjabi people)?

    • For the ANI+ASI estimation, you need to exclude samples that have more complicated admixture. For that purpose, a PCA is a good inclusion mechanism. We know that Burusho have some East Eurasian ancestry. That means we can't model them as a simple ANI + ASI mix. Kalash have a different problem due to genetic drift.

      As for Kalash and Burusho being closer to Pathans, I refer you to my ChromoPainter analyses.

  28. Zack, when you get the time, is it possible for you to post the individual Kalash and Burusho results? Thank you.

  29. This was basically cover in 2011 -- in a book entitled "Grandpa Was A Deity: How a Tribal Assertion Created Modern Culture"

  30. Aratta – Sivilisasjonens vugge - pingback on November 2, 2013 at 5:49 am
  31. I have been experimenting with Admixtools lately. I don't have the Onge dataset yet (hopefully will have it soon), but the Dai seems to be a good substitute for Onge. After filtering out all the Sindhi samples, using the formula alpha = f4 (yoruba, basque, sindhi, dai)/f4(yoruba, basque, georgians, dai), I end up with ANI% of 63.87. This is a pretty close number to the Moorjani result. If I don't exclude the outliers, the Sindhi average drops to 50 % or so.
    Here are some other interesting results:
    Pathan : 69.87 ( I did not filter out anybody here) --Again similar to the Moorjani average.
    Punjabi Arain : 62.19 (Did not filter out anybody here)
    Gujarati Brahmins : 61.68 (Here I am using only individuals with > 10% Northern Euro per harappa)
    Nepali Brahmins : 60.11 (Using individuals with >10% Northern European per Harappa). Here I also included my own sample.

    Using Northern European to represent B (in the F4 formula), you get bloated numbers due to the Northern Eurasian admixture in Northern Europeans.

    In reality, it is pretty tough to exactly calculate the ANI% due to the fact that everybody in West Asia is mixed.

    • If you look at Supplement 2 of the Reich paper and check out Note S4 Figure 2, it seems to me that the Dai are closer to ASI than even the Onge. Just calculate distance between Dai-ASI and Onge-ASI
      Dai-ASI is 0.031 + 0.045
      ASI-Onge is 0.031 + 0.088

      In any event, using Han as a substitute to Onge/Dai also yields similar results.

    • Two more results and I'm done. Same formula (using Yoruba, Baque, Georgians, South Asian, Dai).
      Kalash: 75.78% ANI
      Burusho: 62.17 %

      The Kalash probably have the highest ANI% in all of South Asia.

    • Great! You should link to some of your analyses! Also, if you're interested in my raw-data, what's your handle at 23andMe? I could email you my raw-data.

      • Also, that makes sense, since ASI ancestry is just East Eurasian ancestry.

      • My handle is Nepalese Man. Right now, I can only add 25,000 or so SNP's from 23andME to my dataset. So I say don't send me your dataset yet because the results will not be as accurate as you will like. Wait a week or so. I've been experimenting for only a week.
        Also, here is why the 2009 Reich's paper is less accurate than the 2013 Moorjani paper. Here I am calculating Karitiana-like admixture in Northern Europeans. The populations I am using are papuan, Yoruba, Utahn Whites, georgians and Karitiana.
        % East Asian I get is 10%. It's even higher for the Russians.
        Now, replace Utahn Whites with Sindhis. I get 37% Karitiana-like admixture. So it seems to me that using Utahn Whites overestimates ANI because the Karitiana-like admixture is ASI-like.
        Nonetheless, I feel that some of the Karitiana-like admixture is ANI, since it came to South Asia from people with substantial Northern European admixture.
        What Moorjani and Reich call "ANI" is in reality West-Eurasian only.
        This is just my opinion though.

        • Very fascinating! Have you tried calculating West Eurasian ancestry for the Behar Iranians? I'd be very interested in how they turn out.

          Also, since Lipson et al found that Basques and Sardinians are at least 20% East Eurasian, is there any way you can use another predominately West Eurasian population? Basically, a West Asian population with an extremely weak European affinity, and no real recent East Asian or African admixture?

          A big question, did you include HGDP0220 in the Pashtun population? If yes, can you exclude this individual, and recalculate, since HGDP0220 has very substantial East Eurasian admixture?

  32. Thanks!

    • Looks like I don't have the Pathan 220 sample to begin with. However, I removed HGDP 239 and 237 Pathan samples, which yields ANI% of 70.81%.
      I also calculated the Iranian "ANI%" using the Reich et al formula, and I get a value of 75.2% for Iranians. I am using Georgians again as B in the F4 formula.
      If there are outliers in the Iranian samples that you know about, I will remove those.
      I also replaced Basque with Armenian and calculated ANI% for Pathan. This yields ANI of 70.6. Again, 239 and 237 have been removed.
      So using Armenian did not drastically change the results.

      • Thank you very much!

        I'm quite surprised. The gap between the HGDP Pashtun and Iranians is only a meager 4%-5%, while the gap between the HGDP Pashtun and Punjabi Arain is 8%? Rather surprising (although, 8% is still small).

        In your view, how much of the non-West Eurasian admixture among the HGDP Pashtuns is an "ASI" form of East Eurasian, and how much is a Turkic-Siberian form of East Eurasian? Or, do these methods not allow this kind of resolution?

        Another interesting question is how East Eurasian the "ANI" themselves were. I'm engaging in a lot of false "misplaced concreteness" here, and treating as fixed categories things that are really fluid and undefined abstractions, but what the heck, it's fun ;-)! Although, accepting all of this for what it is, I wouldn't be surprised if the "ANI" themselves were around 10%-20% East Eurasian. If that is taken into account, the extent of "ANI" ancestry among South Asians would substantially increase.

        Again, thank you very much for sharing your experiments with Admixtools. For those of us without the proper operating systems and computational muscle, it is always welcome that people share such information.

        When you start experimenting with the tool again, can you try this with Turkish samples?

        • Also, I believe I found two outliers among the Iranians. GSM536752 and GSM536746. Also, an interesting tweak could involve excluding HGDP00232, HGDP00258, and HGDP00230 from the Pashtun population, just like you excluded HGDP00239 and HGDP00237. These individuals tend to cluster with Punjabis/Northwest South Asians in MCLUST, as well as in ChromoPainter/FineStructure. The other Pashtun samples tend to constitute a single cluster. Once you remove the Iranian and Pashtun outliers, I'd be interested to see any changes. Such a small gap between Pashtuns and Iranians strikes me as odd, given what we see in PCA and Admixture. If Pashtuns are 70% West Eurasian, I expected Iranians to be at least 80%. So, I wonder if these two Iranian outliers are skewing things?

          Another nice experiment could involve the Tajiks.

          • Okay, I removed a bunch of outliers from the Iranian population. Here are the results:
            1. Iranian : 84.06% ( I ran the program one by one on every sample).
            2. Pathan : 70.94 (it did increase a little bit)
            3. Turks : 94.7

            The ANI % for the Pathan outliers is 66.26. So a pretty big difference.

            Yes, I agree that some of the ANI is actually east eurasian. Even Zack's result points to that (beringian and siberian like components in punjabi jatts for example).
            I also think the actual "ANI" is higher than Moorjani reported.

          • The % West Eurasian for Iranian outliers was only 58.86. The Iranian samples seem very diverse based on the results. I actually also removed a sample that was 95% West Eurasian. The majority of samples are in the 80-90 range.

  33. I also calculated West Eurasian % for Armenians using the same Moorjani formula, which yields 98.4 %.
    This is it for now. Will do more later when I have the chance.

    • Can you calculate ANI% for kashmiri samples of Moorjani and Harappa , without including kashmiri pahari samples. Thanks

      • The Moorjani samples are all Kashmiri pandits. I don't have access to Harappa data.
        Professor Reich told me he would provide me with the data. It's been a week and I haven't received them yet. Time to shoot another email.

  34. Thanks Curious!

    I'd venture that 10% of the non-West Eurasian ancestry of the HGDP Pashtuns and neighboring Indo-Aryan populations is actually a vestige of the "ANI".

    Have you tried anything with the Brahui?

    • I am getting weird results for the Brahui. Only 60.8% ANI. 62.5% for the Baloch.
      I have seen a blog post by Dienekes, where he mentions that the Baloch/Gedrosia component breaks down into Caucasian plus Siberian.

      Also, only 45% ANI for the Makrani.

      I think I know what's going on. The Yoruba is supposed to be an outlier. Due to the fact that these groups may have some subsaharan, the results that I got are probably not accurate.

      • Very interesting. I think you're absolutely correct, the Sub-Saharan African admixture complicates things.

        Nevertheless, there might be something more at play. Don't the Baloch have more Sub-Saharan admixture than the Brahui? I was under the impression that the Baloch are a more cosmopolitan spin on the Brahui in terms of genetic affinities.

        So, if we want to use HarappaWorld as an example, we can say non-West Eurasian admixture is to be found in the "Baloch" component, (which I'm sure has a slight "ASI" shift, and maybe other complicating East Eurasian affinities), the "South Indian" component (almost evenly split between West and East Eurasian, but slightly more Western than Eastern), the "NE European" component (a heavy chunk of Amerindian-related admixture is "hidden" within it), and then the trace percentages of East Eurasian components we also find (NE Asian, Siberian, Beringian, etc). Not really comparable, two very different methods here, but I think we can say this in the context of Admixture.

        • Don't linguists believe that the Brahui actually came from somewhere in India? If that's the case, you can expect them to have lower ANI than the Baloch.
          However, the HAP results were making me believe that the Brahui are the original dravidians and that the linguists are wrong.
          Yeah, I agree, the "components" aren't really "pure" so to speak.
          I will do some filtering when I get the chance and see if there are outliers in my samples that are causing problems.

    • Okay, I replaced Yoruba with Biaka Pygmy. Only a slight improvement. I notice that the Harappa database has no outliers for these three groups? The problem may just be lack of SNPs. Only 27k snps being used in my dataset for these three populations.
      Results with Biaka Pygmy as the Outgroup:
      Brahui : 62.8
      Baloch : 64.5
      Makrani: 48.1

      I am using a public dataset available online. Ref.bed to be precise. I think Razib linked to it in 2011.

      • You make a great point here, the low number of snps might definitely play a part in this. But I'm still confused with the Brahui-Baloch difference. I always expected the Brahui to be more West Eurasian than the Baloch.

        Do any changes occur with other populations when using the Biaka? Or are results pretty robust to outgroup changes?

        • The results vary even if I switch to San from Yoruba by a tiny bit. It's still pretty consistent. 3 results using Biaka as the outgroup:
          Sindhi: 66.48
          Pathan : 73.9
          Punjabi Arain : 65.5

          Basically a difference of 3%.
          The Sindhis have some African admixture even after I do some filtering. I tested it a few days ago. I think I was getting 0.5% African for the regular samples.

          • Cool, I have a feeling that using the Biaka as an outgroup gives more "accurate" results, since the Biaka are much more diverged from Eurasians than the Yoruba are. Their greater divergence means "cleaner" results, would'nt you agree? The San are even more diverged from Eurasians, but they have some West Eurasian admixture that could complicate things. As far as I'm aware, the Biaka are, at least for our purposes, "unadmixed", and almost as distinctive as the San, so aren't they the best outgroup?

  35. HRP282, if San has West Eurasian admixture, then that must be why using San as outgroup gives lower ANI.

    Three results:
    Pathan: 68.1%
    Sindhi : 60.5
    Punjabi-Arain: 60.3

    • These is quite a beautiful result. Extremely interesting, especially in light of the Reich lab finding that the San can have West Eurasian admixture on the level of 15%. A while back, Dienekes was working with different ascertainment schemes in Admixture, and I recall he found that the Yoruba also had some Eurasian admixture (I'm not sure if the Reich lab found anything similar, but their focus was more on South African hunter gatherers). If I recall correctly, the "Pygmies" were the most "African".

Trackbacks and Pingbacks: