Behar Redo

Posted by Zack on April 2, 2011

As part of my effort to create one big reference dataset for my use, I have been going over all the datasets I have and make sure there's no duplicates or relatives or any other strange things that could cause issues with my analysis.

So I went back to the Behar et al dataset, which you can download from the GEO Accession website.

I found three set of duplicates and two pairs with very high identity-by-descent values, which I calculated using Plink. You can see the samples with PI_HAT greater than 0.5 in this spreadsheet. PI_HAT is the proportion IBD estimated by plink. Notice also that all these pairs also have high IBS similarity (the DSC column), more than 83% similar.

The five samples I have removed as a result of this are listed in this spreadsheet.

Datasetbehar, duplicate, ibd, reference

← HapMap Redo

Supervised Continental Admixture →

1 Comments.

Relatives in Datasets | Harappa Ancestry Project - pingback on February 6, 2012 at 6:11 pm

Trackbacks and Pingbacks:

Relatives in Datasets | Harappa Ancestry Project - Pingback on 2012/02/06/ 18:11

Harappa Ancestry Project

Genetics and South Asia

Behar Redo

Related

1 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Archives

Recent Comments

Blogroll

Harappa Ancestry Project

Genetics and South Asia

Behar Redo

Share this:

Related

1 Comments.

Trackbacks and Pingbacks:

Contact

My Sites

Data

Affiliate DNA Tests

Categories

Tags

Archives

Recent Comments

Blogroll