Two Steps Forward, Two Steps Back

I got my daughter a netbook, so now my computer is doing Harappa Project work 24x7.

Also, Simranjit was nice enough to offer me the use of a server. For privacy reasons, I am not going to upload any of the participants' data there but it is much faster than my machine and hence very useful for running Admixture on the reference data (especially with crossvalidation).

As for steps back, I downloaded the current 1000genomes data (1,212 samples, 2.4 million SNPs). It's in vcf format. Using vcftools to convert it to ped format will take about 3 weeks. Yes you heard that right. BTW, the good stuff from a South Asian point of view will come later this year with a 100 Assamese Ahom, 100 Kayadtha from Calcutta, 100 Reddys from Hyderabad, 100 Maratha from Bombay and 100 Lahori Punjabis.

Also, I spent most of Sunday evening and night in the ER and got a diagnosis of ureterolithiasis for my efforts. All I can say is: Three cheers for Percocet!!

UPDATE: Dienekes was kind enough to send me his conversion code which looking at the source code should run really fast.

I am still astonished at why the vcftools conversion code is so slow. May be I should look at their source code.

10 Comments.

  1. Best wishes on a speedy recovery.

  2. get better man! we need you! 🙂

  3. You have my best wishes as well. I hope you recover soon.

    Also, that is very cool of Dienekes to help you out like that!

  4. Feel better soon!

  5. 1000genomes | Harappa Ancestry Project - pingback on April 10, 2011 at 9:29 am
  6. Get well soon, Zack!

  7. Didn't notice this post at all - hope you have a speedy recovery Zack :).