Even though the Pan-Asian dataset is not public, there was a request for my script to convert the data to Plink's PED format.
Here is how I convert the Pan-Asian data to Plink's transposed file format.
#!/usr/bin/perl -w $file="Genotypes_All.txt"; open(INFILE,"<",$file); open(TFAM,">","panasian.tfam"); open(TPED,">","panasian.tped"); $line = <INFILE>; chomp $line; @first = split('\t',$line); foreach my $sample (5..$#first) { print TFAM "0 $first[$sample] 0 0 0 -9\n"; } my $alleles; while(<INFILE>) { chomp; @lines = split('\t',$_); my ($major,$minor) = split('/',$lines[4]); print TPED "$lines[2] $lines[1] 0 $lines[3]"; foreach my $snp (5..$#lines) { if ($lines[$snp] == 0) { $alleles = "$major $major";} elsif ($lines[$snp] == 1) { $alleles = "$major $minor";} elsif ($lines[$snp] == 2) { $alleles = "$minor $minor";} else { $alleles = "0 0";} print TPED " $alleles"; } print TPED "\n"; } close(INFILE); close(TFAM); close(TPED); |
Again, no guarantees! It's Perl though, so it should be more stable across various operating systems.
hats off to you!
you are the man. keep it up.
Hey Zack, do you know of a way to output a list of samples in a particular order when using the --keep flag in PLINK?
Not that I know of.
How about the .map file?
Simply use plink to convert from tped/tfam to ped/map or bed/bim/fam.
I cant find a command in plink that does that... do you know what it is?
nevermind, it´s done 🙂 Thanks for the good script!