I got access to the Reich et al (Nature 2009) dataset used in their paper "Reconstructing Indian population history".
It has the following populations:
Aonaga |
Aus |
Bhil |
Chenchu |
Great_Andamanese |
Hallaki |
Kamsali |
Kashmiri_Pandit |
Kharia |
Kurumba |
Lodi |
Madiga |
Mala |
Meghawal |
Naidu |
Nysha |
Onge |
Sahariya |
Santhal |
Satnami |
Siddi |
Somali |
Srivastava |
Tharu |
Vaish |
Velama |
Vysya |
There are 141 individuals with 587,753 SNPs in their dataset which conveniently is in PED format.
Also, Blaise pointed me to the Pan-Asian SNP data used in the Dec 2009 Science paper "Mapping Human Genetic Diversity in Asia".
It includes the following 71 populations:
Maya |
Auca |
Quechua |
Karitiana |
Pima |
Ami |
Atayal |
Melanesians |
Zhuang |
Han_Cantonese |
Hmong |
Jiamao |
Jinuo |
Han_Shanghai |
Uyghur |
Wa |
Alorese |
Dayak |
Javanese |
Batak_Karo |
Lamaholot |
Lembata |
Malay |
Mentawai |
Manggarai |
Kambera |
Sunda |
Batak_Toba |
Toraja |
Andhra_Pradesh |
Karnataka |
Bengali-Assamese |
Rajasthan |
Uttaranchal |
Uttar Pradesh |
Haryana |
Spiti |
Bhili |
Marathi |
Japanese |
Ryukyuan |
Korean |
Bidayuh |
Jehai |
Kelantan |
Kensiu |
Temuan |
Ayta |
Agta |
Ati |
Iraya |
Minanubu |
Mamanwa |
Filipino |
Singapore_Chinese |
Singapore_Indian |
Singapore_Malay |
Hmong (Miao) |
Karen |
Lawa |
Mlabri |
Mon |
Paluang |
Plang |
Tai_Khuen |
Tai_Lue |
H'tin |
Tai_Yuan |
Tai_Yong |
Yao |
Hakka |
Minnan |
|
|
|
It has 1,719 individuals with 54,794 SNPs. I wish it had more SNPs considering the wealth of populations.
Also, the Pan-Asian data is in the form of minor allele counts, so I need to convert that back to A/C/G/T. Since there are some HapMap populations included in the dataset, that shouldn't be too hard.
I am going to include both these datasets into my big reference set.
Recent Comments