Tag Archives: harappa

HarappaWorld HRP0298-HRP0311

I have added the HarappaWorld Admixture results for HRP0298-HRP0311 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages.

We have an Indian adoptee participant, HRP0303. Their results seem closest to non-Brahmin Tamils.

HarappaWorld HRP0289-HRP0297

I have added the HarappaWorld Admixture results for HRP0289-HRP0297 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I also updated the results for HRP0274 using FTDNA Family Finder data instead of the Genographic 2.0 data that was originally submitted. As the Geno2 data has only 14,000 SNPs in common with my HarappaWorld calculator, it's interesting to see HRP0274's admxiture results change:

Component Geno2 FTDNA
South Indian 48.68% 46.00%
Baloch 34.22% 32.99%
Caucasian 4.33% 5.02%
Northeast Euro 3.89% 3.57%
Southeast Asian 2.75% 1.06%
Siberian 1.25% 1.87%
Northeast Asian 1.16% 1.69%
Papuan 1.14% 1.85%
American 0.87% 1.23%
Beringian 0.01% 1.23%
Mediterranean 0.00% 0.39%
Southwest Asian 1.69% 3.10%
San 0.00% 0.00%
East African 0.00% 0.00%
Pygmy 0.00% 0.00%
West African 0.00% 0.00%

The only differences greater than 1% are South Indian (2.68%), Southeast Asian (1.69%), Southwest Asian (1.41%), Baloch (1.23%), and Beringian (1.22%). It's remarkable that only 14,000 SNPs could provide us a decent result.

We have two new Gujarati participants. HRP0292, a Gujarati Jain, seems to be more similar to somewhat southern populations. HRP0294, a Gujarati Sunni Vohra, has results somewhat similar to HRP0265 (Gujarati Patel Muslim) and more north-oriented. Therefore, I have separated a new ethnic category of Gujarati Muslims in my ethnic spreadsheet. I'll have averages when I compute them next time.

We have two Indian adoptee participants as well. HRP0297 has results which match well with the Bengalis (other than the Brahmins) in this project. HRP0290's results are somewhat harder to figure out. The closest groups, not too close, are probably Tharu from Uttarakhand and Satnami from Chhattisgarh (Reich et al dataset). A ChromoPainter analysis would be more useful here.

HarappaWorld HRP0284-HRP0288

I have added the HarappaWorld Admixture results for HRP0284-HRP0288 to the individual spreadsheet.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

I have also updated the group averages (weighted) spreadsheet.

HarappaWorld HRP0273-HRP0283

I have added the HarappaWorld Admixture results for HRP0273-HRP0283 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

We got our first Pashtun participants, one Afghan and one Pakistani. Both have very similar results and are not much different than the HGDP Pathan sample average in their South Indian component.

HRP0278, a Bengali (mostly), is more East Asian components than any other Bengali participants (including my friend Razib.)

HarappaWorld HRP0253-HRP0272

I have added the HarappaWorld Admixture results for HRP0253-HRP0272 to the individual spreadsheet.

I got two participants from the Geno 2.0 Project. While I have calculated their HarappaWorld Admixture results, please note that Geno2 has only about 14,000 SNPs in common with HarappaWorld. Thus these results are very noisy.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

HarappaWorld HRP0250-HRP0252

I have added the HarappaWorld Admixture results for HRP0250-HRP0252 to the individual spreadsheet.

However, I have not recomputed the weighted averages for the Kashmiris or Bengali Brahmins. Also, I am not sure about Tamil Gounder. Wikipedia says they are Vellalars, but I don't know if I should report separate Gounder results or include in the Tamil Vellalar average.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

HarappaWorld HRP0245-HRP0249

I have added the HarappaWorld Admixture results for HRP0245-HRP0249 to the individual spreadsheet.

I have also recomputed the weighted averages for Kurds (from 6 to 10 now).

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

Let's look at the Kurdish results from Yunusbayev (prefix: kurd), Xing (prefix: F) and Harappa (prefix: HRP). Do note that the Xing results were computed with a smaller number of SNPs and thus might be noisy.

HarappaWorld HRP0240-HRP0244

From now on, instead of waiting till I have a batch of 10 new participants to compute their Admixture results, I'll run admixture at the start of the month for those who submitted their data during the previous month.

So I have added the HarappaWorld Admixture results for HRP0240-HRP0244 to the individual spreadsheet.

I have also recomputed the weighted averages for Bengalis (from 3 to 5 now), Kerala Muslims (from 1 to 2), and Georgians (from 3 to 4) while adding a new one for our first North Ossetian participant.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations. Finally, the standard error estimates on these results can be about 1%. Therefore, it is entirely possible that your 1% exotic admixture result is just noise.

HarappaWorld Oracle

Here's the HarappaWorld Oracle to go with the HarappaWorld admixture results and DIYHarappaWorld.

It works similar to the old Ref3 Harappa Oracle, with a couple of differences. One, there is no panasian switch since the Pan-Asian dataset is not included in this calculator.

I have added an optional mincount argument. It picks only those groups where the number of individuals is equal to or more than mincount for the Oracle calculation. By default mincount is 2, so only those groups which have 2 or more samples are used to compute your Oracle results.

Let's look at my top 20 Oracle results in mixed mode excluding population groups with less than 4 individuals.

HarappaOracle(c(26.46,36.82,14.22,4.78,0.00,1.32,0.86,0.04,0.19,0.06,3.63,8.07,0.00,2.44,0.43,0.67),k=20,mincount=4,mixedmode=T)

[,1] [,2]
[1,] "18.1% egyptian_behar_12 + 81.9% punjabi-arain_xing_25" "2.3361"
[2,] "18.1% egypt_henn2012_19 + 81.9% punjabi-arain_xing_25" "2.5615"
[3,] "80.7% punjabi-arain_xing_25 + 19.3% yemenese_behar_8" "2.8388"
[4,] "18.4% palestinian_hgdp_46 + 81.6% punjabi-arain_xing_25" "2.9944"
[5,] "84.7% punjabi-arain_xing_25 + 15.3% yemen-jew_behar_15" "3.0923"
[6,] "19.1% jordanian_behar_20 + 80.9% punjabi-arain_xing_25" "3.1877"
[7,] "18% egypt_henn2012_19 + 82% sindhi_hgdp_24" "3.4814"
[8,] "17.9% egyptian_behar_12 + 82.1% sindhi_hgdp_24" "3.5554"
[9,] "20.3% jordanian_behar_20 + 79.7% punjabi_harappa_7" "3.6161"
[10,] "18.9% egyptian_behar_12 + 81.1% punjabi_harappa_7" "3.6587"
[11,] "19.5% palestinian_hgdp_46 + 80.5% punjabi_harappa_7" "3.7079"
[12,] "19% egypt_henn2012_19 + 81% punjabi_harappa_7" "3.8303"
[13,] "18.3% palestinian_hgdp_46 + 81.7% sindhi_hgdp_24" "3.8762"
[14,] "80.4% punjabi-arain_xing_25 + 19.6% syrian_behar_16" "3.8908"
[15,] "19% lebanese_behar_7 + 81% punjabi-arain_xing_25" "4.0494"
[16,] "18.9% jordanian_behar_20 + 81.1% sindhi_hgdp_24" "4.078"
[17,] "79.9% punjabi_harappa_7 + 20.1% yemenese_behar_8" "4.1222"
[18,] "15.1% bedouin_hgdp_46 + 84.9% punjabi-arain_xing_25" "4.1522"
[19,] "85.3% punjabi-arain_xing_25 + 14.7% saudi_behar_20" "4.2014"
[20,] "79.1% punjabi_harappa_7 + 20.9% syrian_behar_16" "4.2191"

These results are closer to my actual reported ancestry than the ones from reference 3 oracle.

HarappaWorld Admixture

Here is a new admixture calculator. This uses populations all over the world and I got the best results (i.e., lowest crossvalidation error) at K=16.

You can see the admixture results for different ethnic groups as well as results for individual (founder-only) project participants.

UPDATE: The population results have been calculated using weighted means.

The group results are also shown in the usual interactive bar chart below. You can click on the component labels to sort by that ancestral component.

Do note that the admixture components do not necessarily represent real ancestral populations. Also, the names I have chosen for the components should be thought of as mnemonics to ease discussion. I chose them based on which populations in my data these components peaked in. They do not tell anything directly about ancestral populations. The best way to look at these admixture results is by comparing individuals and populations.

I used about 188,173 SNPs for this run. The results for Henn2011 (181,223 SNPs for Hadza, Sandawe and San, 26,494 SNPs for other groups), Henn2012 (26,494 SNPs), Reich (48,967 SNPs) and Xing (18,986 SNPs) datasets reported above were however calculated using lower number of common SNPs. Hence caution should be exercised in interpreting those results.

You can also see the Fst distances between the ancestral components.

I should have HarappaWorldOracle and DIYHarappaWorld calculators out in the next few days.

Also, I am working on another calculator which will focus more closely on South Asia.