First of all, I wanted to draw your attention to the fact that I am using weighted means for population averages for HarappaWorld instead of just averaging all samples' results. The weighting gives less importance to outliers. I find this to be a better solution than a simple average or median. A median removes all outliers but it also rejects a lot of information.
An example of the weighted mean effect can be seen in the Behar et al Armenian samples. Four of the samples have higher NE European percentages than the rest. As you can see in the table below, the weighting makes their impact on the population results low.
|
Mean |
Weighted Mean |
Ethnicity |
armenian |
armenian |
armenian |
armenian |
Dataset |
behar |
yunusbayev |
behar |
yunusbayev |
N |
19 |
16 |
19 |
16 |
S Indian |
0.37% |
0.52% |
0.41% |
0.52% |
Baloch |
16.57% |
17.73% |
17.07% |
17.65% |
Caucasian |
54.35% |
56.43% |
57.29% |
56.61% |
NE Euro |
8.96% |
2.98% |
5.35% |
2.95% |
SE Asian |
0.10% |
0.12% |
0.10% |
0.13% |
Siberian |
0.49% |
0.09% |
0.29% |
0.09% |
NE Asian |
0.14% |
0.08% |
0.16% |
0.09% |
Papuan |
0.28% |
0.27% |
0.26% |
0.27% |
American |
0.19% |
0.18% |
0.22% |
0.18% |
Beringian |
0.26% |
0.19% |
0.23% |
0.20% |
Mediterranean |
8.46% |
8.37% |
8.21% |
8.40% |
SW Asian |
9.81% |
13.03% |
10.40% |
12.91% |
San |
0.00% |
0.00% |
0.00% |
0.00% |
E African |
0.02% |
0.00% |
0.01% |
0.00% |
Pygmy |
0.00% |
0.00% |
0.00% |
0.00% |
W African |
0.00% |
0.00% |
0.00% |
0.00% |
Another example is the Somali samples in Reich et al data. There is one sample (out of 6) who seems to be eastern Bantu. Let's compare the unweighted mean and weighted mean for Somalis in Reich et al and Harappa participants.
|
Mean |
Weighted Mean |
Ethnicity |
somali |
somali |
somali |
somali |
Dataset |
harappa |
reich |
harappa |
reich |
N |
2 |
6 |
2 |
6 |
S Indian |
0.00% |
1.62% |
0.00% |
1.49% |
Baloch |
0.00% |
0.00% |
0.00% |
0.00% |
Caucasian |
2.76% |
0.00% |
2.76% |
0.00% |
NE Euro |
0.00% |
0.11% |
0.00% |
0.04% |
SE Asian |
0.27% |
0.05% |
0.27% |
0.06% |
Siberian |
0.00% |
0.04% |
0.00% |
0.05% |
NE Asian |
0.00% |
0.41% |
0.00% |
0.46% |
Papuan |
0.26% |
0.10% |
0.26% |
0.11% |
American |
0.14% |
0.17% |
0.14% |
0.19% |
Beringian |
0.23% |
0.33% |
0.23% |
0.38% |
Mediterranean |
2.12% |
3.25% |
2.12% |
3.65% |
SW Asian |
31.73% |
24.48% |
31.73% |
27.33% |
San |
1.96% |
1.48% |
1.96% |
1.37% |
E African |
60.37% |
56.75% |
60.37% |
60.13% |
Pygmy |
0.15% |
1.78% |
0.15% |
1.23% |
W African |
0.00% |
9.43% |
0.00% |
3.51% |
Also, I have divided Singapore Indians into 4 groups (actually 3 groups and 1 outlier) since they are so heterogeneous. Here are the weighted mean admixture proportions for all Singapore Indians and the four subgroups.
Ethnicity |
singapore-indian |
singapore-indian-1 |
singapore-indian-2 |
singapore-indian-3 |
singapore-indian-4 |
Dataset |
sgvp |
sgvp |
sgvp |
sgvp |
sgvp |
N |
83 |
31 |
41 |
10 |
1 |
S Indian |
53.57% |
61.95% |
50.39% |
33.68% |
27.81% |
Baloch |
33.97% |
30.24% |
36.00% |
40.72% |
14.27% |
Caucasian |
3.55% |
1.92% |
4.03% |
9.32% |
4.53% |
NE Euro |
2.93% |
0.08% |
3.89% |
9.84% |
35.38% |
SE Asian |
1.31% |
1.30% |
1.23% |
0.63% |
1.20% |
Siberian |
0.45% |
0.47% |
0.44% |
0.43% |
1.19% |
NE Asian |
0.92% |
0.91% |
0.80% |
1.19% |
3.26% |
Papuan |
0.72% |
1.09% |
0.50% |
0.35% |
0.62% |
American |
0.42% |
0.35% |
0.44% |
0.69% |
1.29% |
Beringian |
0.56% |
0.38% |
0.65% |
0.76% |
0.00% |
Mediterranean |
0.67% |
0.40% |
0.72% |
1.33% |
10.38% |
SW Asian |
0.90% |
0.86% |
0.87% |
1.05% |
0.06% |
San |
0.01% |
0.00% |
0.01% |
0.00% |
0.00% |
E African |
0.03% |
0.02% |
0.04% |
0.00% |
0.00% |
Pygmy |
0.00% |
0.00% |
0.00% |
0.00% |
0.00% |
W African |
0.01% |
0.01% |
0.00% |
0.00% |
0.00% |
I have updated the spreadsheet as well as HarappaWorld Oracle.
Recent Comments