Taking Suggestions

Posted by Zack on May 22, 2011

What would you want from this project? What sort of analyses would you like me to do?veroxybd.com

I know several of you want regional admixture/PCA analyses and those are coming starting next week.

In addition to that, is there something specific you would like to be investigated?

For example, is there some specific supervised admixture you would like me to run? A specific PCA/MDS analysis?

Or do want me to try to synthesize all the results we have gotten into some sort of coherent theory instead of throwing out the numbers like I have been doing?

Miscellaneoussuggestions

← Harappa Participants Map

Participants' Help Needed →

47 Comments.

reiver May 22, 2011 at 1:48 pm

I don't know if this is possible, but something more at a city level of granularity would be interesting.

(Note, I'm aware that if people don't "breed locally" then this is probably nonsensical.)
- Zack May 22, 2011 at 5:12 pm
  
  City level analysis would require careful DNA collection in my opinion. I am relying on expats here, so it would be hard.
AV May 22, 2011 at 2:07 pm

"Or do want me to try to synthesize all the results we have gotten into some sort of coherent theory instead of throwing out the numbers like I have been doing?"

That'll be great, considering the excellent and insightful discussion it tends to fuel, like on Brown Pundits. Regional analyses would also be great but I wonder whether a specific state based one or a pan-regional one (best way I can describe it), like for example a South Indian or North-West South Asia specific analysis with the appropriate reference sample individuals would be more fruitful since we can do a comparison of, f.ex:- Brahmins across several localities and how they compare with the non-Brahmins of their own state and the other states. Do you plan on running that reference data-set you acquired a while ago (Pan Asian data-set from the paper "Mapping Human Genetic Diversity in Asia", or do the SNPs tested correlate poorly with that of 23andMe and FTDNA?
- Zack May 22, 2011 at 5:14 pm
  
  The Pan-Asian dataset has only about 14,000 SNPs in common with 23andme and my other sources. So admixture analysis is going to be very noisy. I do want to see what PCA would show.
razib May 22, 2011 at 3:02 pm

i don't think it is optimal that you spend TOO much time on analysis, just the basics. you prolly have some intuitions from running ADMIXTURE on this data set we don't know. you're better at knocking down speculation i would think in terms of value add 🙂
Azad May 22, 2011 at 8:44 pm

Zack...is it possible for you to understake an admixture analysis comparing us to turkic populations?
- Azad May 22, 2011 at 8:44 pm
  
  undertake*
  - AV May 22, 2011 at 9:49 pm
    
    Azad, you Turkophile, you!
    
    On a more serious note, wouldn't a South Asia vs Central Asian comparison simply result in "wrap-around" affinities since both populations have a West-Eurasian/non-West Eurasian cline?
    - Azad May 23, 2011 at 8:21 pm
      
      Hahaha, studying about turkism is very interesting, I can't help it!!
      
      I see what you mean though; central asian admixture itself is very mixed due to cultural melting pot that Central Asia was.
      But what if we're compared to Siberians, Mongolians etc instead? Zack, what do you think?
Simranjits May 22, 2011 at 9:28 pm

I would be interested in testing the hypothesis that some of the tribes in the north west descend at least in part from eastern scythians(sakas) . Some of these include pathans , jatts , rajputs and more. This might of course only be visible at higher K's.
- razib May 22, 2011 at 9:51 pm
  
  how? i.e., what is your reference pop for sakas? ossetians? perhaps tajiks, who might be similar?
- AV May 22, 2011 at 10:07 pm
  
  I was thinking about just this yesterday, incidentally and I was wondering whether the elevated European scores among the few Jatt participants we have in the project might be a by-product of mixing with the Sakas? Anyone who has been on Jatt-related portals and websites would have come across some articles wherein certain Jatt gots claim descent from Saka invaders. Considering the slightly more elevated European scores the three Jatts of HAP have, perhaps the assertion that some Jatts gots have Scythian progenitors should not immediately be written off as a claiming-foreign-origins exercise? Yes, this would definitely be worth investigating.
  
  In addition to that, some speculate that Y-DNA G2a3b* and G2a3b1, which is modal as far as South Asia is concerned among the Tamil Brahmins and Gujarati Brahmins at a frequency of 13% and 10% for the Iyengars and Iyers; and 10.9% for the Gujarati Brahmins respectively may well be an artifact of the Indo-Scythian empires, perhaps via a connection to the priests of those empires. Perhaps it'd be worth a shot investigating the aforementioned groups in a similar manner? The author of this page, a G2a3b1 Y-DNA individual himself, has some speculations (and only speculations, mind you, this shouldn't be the final word on anything) with regards to the Indo-Scythians-
  
  -"Another possibility is that [my] ancestors stayed in the Gujarat area following the decline of the Saka and Indo-Parthian empires, perhaps as priests serving Somnath temple, dedicated to god Shiva. Some priests of the Saka kings (called Magas, a term that denotes Zoroastrian priests) converted to Brahminism after coming to India (there is documented evidence of conversion of 18 priests). These converts became scholars of Sanskrit the Indo-Scythian-rulers were the first in India to introduce Sanskrit as the official language for State-related communication (before second century CE, there is no evidence of use of Sanskrit in India for official State business). When Somnath temple was first destroyed in 725 CE by the Arab governor of Sind, or when the re-built temple was destroyed by Mohammed Ghazni in 1024, many of the priests fled Gujarat, some to Tamilnadu, at the invitation of the Chola kings.ï¿½ There is a historical record of migration of priests from Somnath temple to Tamilnadu during this period. I believe my ancestor is not very likely to have come South via this migration. A key reason is that the proportion of G2a3b1 among Iyers (worshippers of Shiva) and Iyengars (worshippers of Vishnu) is roughly the same (see below in the graph). Had my ancestor come after the raids on Somnath temple, there would really be no reason for him to convert to Vaishnavism from Shaivism after migrating South, when Shaivism was already flourishing in the South.ï¿½ It is more likely that my ancestor came with the earlier migrations of the remnants of the Indo-Scythian empire, and was either a worshipper of Vishnu, or a follower of Buddha. Some of these migrants converted to Shaivism (as evidenced by substantial presence of G2a3b1 among Iyers), but many would have chosen to remain worshippers of Vishnu."
  
  More on this site - Source
  - AV May 22, 2011 at 10:09 pm
    
    Just to clarify for the non-Browns and the Desi-Americans here, Got = the Punjabi equivalent of Gotra, IIRC.
    - Garvan May 23, 2011 at 2:27 am
      
      What does "non-Browns" mean? I have seen the term used before but I am left wondering what Indian populations this refers to, if indeed, the term is restricted to Indians.
      
      Garvan
      - AV May 23, 2011 at 3:23 am
        
        Don't take it seriously, it's usage is in jest for the most part. Basically non-South Asians.
    - Zack May 23, 2011 at 7:54 am
      
      Good thing you linked Gotra, otherwise I was gonna ask what Gotra is. Lol
  - Valikhan May 23, 2011 at 11:41 am
    
    Ossetians are Iranian by language only. Genetically they are as local people as Georgians, Chechens and many others.
    - reiver May 23, 2011 at 10:15 pm
      
      @Valikhan: Regarding the Ossetians and Iranians, I came across this before:
      
      http://onlinelibrary.wiley.com/doi/10.1046/j.1529-8817.2004.00131.x/abstract
      
      (It's open access, so you should be able to read it, if you are so inclined.)
    - Zachary Latif May 24, 2011 at 8:48 am
      
      There is no such thing as Iranian by language or genetics.
      
      Iranianism is an espirit de corp of lands that were once influenced by Iran and is not reducible to mere components.
      
      Sorry your comment brought the pan-Iranist in me Loll..
- Parasar May 22, 2011 at 11:06 pm
  
  The northern Sakas were in Yutian/Khotan, the central between Mathura and Gandhara, and southern in Malwa and Gujarat. First we can't say who were the precursor Saka. Assuming that the Khotanese were the precursor Saka who (Moga/Moasa/Maues) entered India via the Karakorum pass, you would expect their descendants to show an east Asian signature.
  - Valikhan May 24, 2011 at 2:59 am
    
    reiver,
    
    Nasidze is wrong in many ways. Especially his Y data is proven to be misleading.
    You should read 2011 paper by Balanovskiy and Dibirova. They took 1,500 samples all over Caucasus. And their conclusion is that Ossetians as local as other people, just adopted Iranian language.
    - reiver May 24, 2011 at 9:10 am
      
      (Just to confirm,) is this what you are referring to?
      
      http://mbe.oxfordjournals.org/content/early/2011/05/13/molbev.msr126.short
      - Valikhan May 24, 2011 at 9:44 am
        
        It is. Tell me your email and I'll drop this article with supplements.
        Zack knows my email address.
- Parasar September 18, 2012 at 4:59 pm
  
  Four Jats tested - all are L657+
  
  U2321 Amar Sandhu, Jalandhar, Punjab, India L657+
  U2810 Tharn Bajwa, Pakistan, L657+
  N22414 Luddan Singh Ranu, 1800s, Manki, Punjab L657+
  163483 Gurbax Singh Sidhu, born c. 1905 in Dod, Punjab L657+
  
  No evidence as yet on the the presence of L657 outside the Indian Subcontinent and Arabia (except one Babasan Kazakh line).
Balaji May 23, 2011 at 2:18 am

Here are my suggestions.

(1) Run a K=2 ADMIXTURE analysis with the same populations as Reich et al. Note that they did not include Balochi and Makrani or the Munda and Tibeto-Burman tribes because these populations did not fall along the "Indian Cline". That component which is highest in the Pathans and Sindhis can be identified as ANI and that which is highest in the Mala and Madiga as ASI. It will be interesting to see if it is possible to reproduce the Reich et al. results using ADMIXTURE.

(2) Run a K=4 ADMIXTURE with all the South Asian populations including Balochis, Makranis and Munda tribals with some African and East Asian populations to take out the effect of African and East Asian admixture. Perhaps this will allow a better separation of ANI, ASI and East Asian.
- AV May 23, 2011 at 3:29 am
  
  ASI was already inferred here, thanks to the presence of the Onge component, just so you know.
  
  Outside of Harappa, Dienekes Pontikos' Dodecad ancestry project has attempted to infer ANI-ASI for the South Asian participants here and here. Of course the number of subcontinental participants here far exceeds that of Dodecad (which is but natural).
Valikhan May 23, 2011 at 11:46 am

What about RHH Mapper then? Since we all mixed that could shed some light into.
Bliss May 23, 2011 at 10:19 pm

There are a number of things you can do that would make seeing patterns easier:

1. Categorize your lists and charts by ethnicity: all the Tamils, Punjabis etc lumped separately; and then subcategorize by caste.

2. Show results by proportion of components.

3. Put the two largest components on either end.
- Zack May 24, 2011 at 5:49 am
  
  You can do part of #1 by sorting by the ethnicity column in the table that goes along with the participants' admixture barchart.
  
  You can do #2 on the chart by clicking on the legend. In the Google spreadsheet, switch to List View and then you can sort by column.
AV May 24, 2011 at 9:27 am

Running the new K=11 against the old reference I and reference II populations and consequently inferring ASI would also be nice, especially for the Xing et al data-set and other South Asian-specific data sets but I suppose that would take a lot of time, unless I am mistaken. There is also a certain amount of overlap between the populations across the different references, yes? But I suppose those populations, such as the Tamils, Telugu (A.P) and Punjabi Arain individuals will be run at K=11 in case of a prospective regional-analysis.
- Zack May 24, 2011 at 9:32 am
  
  All Reference I populations are included in Reference 3. Xing et al is the only extra dataset in Reference II.
  
  Since Xing et al has a decent overlap with Reich et al, I plan to do some PCA and ADMIXTURE runs of Xing and Reich (with HapMap and whatever else keeps the number of SNPs above 100k included).
Ibra May 25, 2011 at 12:56 am

Perhaps a replication of this experiment with emphasis on removing populations with high ASI to see how admixture proportions change.

http://dienekes.blogspot.com/2011/03/note-of-caution-on-admixture-estimates.html

The latest reference set at k = 7 seem ideal base for this 🙂
- Zack May 25, 2011 at 8:10 am
  
  I am not sure I get the point of the experiment you are proposing.
  - Ibra May 25, 2011 at 8:53 am
    
    Just wondering about the way admixture works. If you remove high ASI populations from your set will the "South Asian component" of the remaining samples increase from before? The reason being is that I suspect that if we had pure ASI as samples (hypothetical)in terms of frequency South Asian = ASI for each sample.
Garvan May 25, 2011 at 7:45 am

I would be interested in how the Pan Asian data-set Malaysian Negrito sample relates to the Onge. Can the Malaysian Negrito be substituted for Onge as a proxy for the ASI component of Indian populations? Perhaps the Pan Asian has insufficient populations for this purpose? I don't know the Indian populations very well.

Garvan
- Zack May 25, 2011 at 8:12 am
  
  The Pan-Asian dataset has only 14,000 SNPs in common with my other datasets, which is why I can't include them in my regular analyses. But I do plan to do some analyses on Pan-Asian with Reich et al or HGDP etc included.
Ibra May 25, 2011 at 2:32 pm

Ref4c is a favorite reference set so far. Fst divergences between components with a MDS plot + Fst dendrogram would prove interesting for k >= 9
0015 May 26, 2011 at 9:46 pm

a ethnicity run would be great, i would like to know which indian ethnicity i am. so the theory would be great. thanks
- Zack May 28, 2011 at 10:22 pm
  
  Assigning ethnicity to mixed individuals is a little hard.
  
  In your case, the one measure is your ASI percentage. Since your Onge component was about 6% and you are half Roma, let's double it to 12.7%. Using our regression estimate, that means about 23% Ancestral South Indian, which is about the same or lower than people from Northwestern India or Pakistan.
  
  Now does that mean that your Roma ancestors were from Pakistan? Not necessarily. It is possible (even likely) that your Roma ancestors picked up genes on the way from South Asia to Europe and during their stay in Europe, thus reducing the Indianness of their genes.
0015 May 28, 2011 at 12:31 pm

also a human tree would be cool where indians/southasian ethnic groups cluster within the human tree....and what ethnicities are caucasian and what are australoid
- Zack May 28, 2011 at 10:07 pm
  
  A classification tree based on closeness in PCA or admixture results is in the works.
  
  Do note that it is not a phylogeny. Also I have no idea how to divide people up into Caucasoid and australoid.
- 0015 May 28, 2011 at 11:38 pm
  
  Im 60% roma gypsy, since my dad is a quarter gypsy too. So more than half. Can you somehow exclude all non-indian genes and just look at my southasian where it is from? i dont know if that is possible thanks:)
  - 0015 May 30, 2011 at 1:48 am
    
    ?
0015 May 28, 2011 at 12:32 pm

a run including the gypsies of course
- Zack May 28, 2011 at 10:08 pm
  
  I'll include you of course. I think you are the only romany.
0015 May 30, 2011 at 2:23 pm

there are two gypsies which dienekes detected in reich at al i think...among the romanian samples...can you somehow use them? thanks for answer...than we would be 3 gypsies
0015 May 30, 2011 at 2:48 pm

behar at all, sorry

Harappa Ancestry Project

Genetics and South Asia