I got access to the Reich et al (Nature 2009) dataset used in their paper "Reconstructing Indian population history".
It has the following populations:
Aonaga | Aus | Bhil |
Chenchu | Great_Andamanese | Hallaki |
Kamsali | Kashmiri_Pandit | Kharia |
Kurumba | Lodi | Madiga |
Mala | Meghawal | Naidu |
Nysha | Onge | Sahariya |
Santhal | Satnami | Siddi |
Somali | Srivastava | Tharu |
Vaish | Velama | Vysya |
There are 141 individuals with 587,753 SNPs in their dataset which conveniently is in PED format.
Also, Blaise pointed me to the Pan-Asian SNP data used in the Dec 2009 Science paper "Mapping Human Genetic Diversity in Asia".
It includes the following 71 populations:
Maya | Auca | Quechua | Karitiana | Pima |
Ami | Atayal | Melanesians | Zhuang | Han_Cantonese |
Hmong | Jiamao | Jinuo | Han_Shanghai | Uyghur |
Wa | Alorese | Dayak | Javanese | Batak_Karo |
Lamaholot | Lembata | Malay | Mentawai | Manggarai |
Kambera | Sunda | Batak_Toba | Toraja | Andhra_Pradesh |
Karnataka | Bengali-Assamese | Rajasthan | Uttaranchal | Uttar Pradesh |
Haryana | Spiti | Bhili | Marathi | Japanese |
Ryukyuan | Korean | Bidayuh | Jehai | Kelantan |
Kensiu | Temuan | Ayta | Agta | Ati |
Iraya | Minanubu | Mamanwa | Filipino | Singapore_Chinese |
Singapore_Indian | Singapore_Malay | Hmong (Miao) | Karen | Lawa |
Mlabri | Mon | Paluang | Plang | Tai_Khuen |
Tai_Lue | H'tin | Tai_Yuan | Tai_Yong | Yao |
Hakka | Minnan |
It has 1,719 individuals with 54,794 SNPs. I wish it had more SNPs considering the wealth of populations.
Also, the Pan-Asian data is in the form of minor allele counts, so I need to convert that back to A/C/G/T. Since there are some HapMap populations included in the dataset, that shouldn't be too hard.
I am going to include both these datasets into my big reference set.
Recent Comments