Henn Duplicates

Posted by Zack on April 6, 2011

As part of my effort to create one big reference dataset for my use, I have been going over all the datasets I have and make sure there's no duplicates or relatives or any other strange things that could cause issues with my analysis.

So I went back to the Henn et al dataset, which you can download from their website.

There are 107 samples common from the HapMap (IDs start with NA) and 131 from HGDP (IDs start with HGDP).

Henn et al has two PED files. One for the Khoisan data and one for all Africa 55k SNP set. Unfortunately they have 31 San duplicated in both these PED files with same individual IDs but different family IDs (SAN and SAN_SA). So they do not get automatically merged per Plink procedures. Just remove all the ones with SAN_SA FID since they have fewer SNPs. All the IBD info etc is in this spreadsheet.

Datasetafrica, duplicate, henn, ibd, reference

← Harappa Maps

Harappa Admixture Dendrogram 1-80 →

2 Comments.

A best case scenario for unsupervised ADMIXTURE? | Gene Expression | Discover Magazine - pingback on April 7, 2011 at 4:59 pm
A best case scenario for unsupervised ADMIXTURE? | Biology News by Biologged - pingback on April 8, 2011 at 1:34 am

Trackbacks and Pingbacks:

A best case scenario for unsupervised ADMIXTURE? | Gene Expression | Discover Magazine - Pingback on 2011/04/07/ 16:59
A best case scenario for unsupervised ADMIXTURE? | Biology News by Biologged - Pingback on 2011/04/08/ 01:34

Harappa Ancestry Project

Genetics and South Asia

Henn Duplicates

Related

2 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Henn Duplicates

Share this:

Related

2 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll