2016-08-23 214 views

回答

0

在网上搜索发现,有没有工具来做到这一点。您可能要考虑使用VCF格式,EPACTS似乎接受了这一点:

http://genome.sph.umich.edu/wiki/EPACTS#VCF_file_for_Genotypes

可以使用PED转换成VCF砰砰像这样:

plink --file prefix --recode vcf --out prefix 

您可能需要额外的摆弄选项得到它来生成所需的输出,见https://www.cog-genomics.org/plink2/data#recode,specfically:

The 'vcf', 'vcf-fid', and 'vcf-iid' modifiers result in production of a 
VCFv4.2 file. 'vcf-fid' and 'vcf-iid' cause family IDs and within-family IDs 
respectively to be used for the sample IDs in the last header row, while 
'vcf' merges both IDs and puts an underscore between them (in this case, a 
warning will be given if an ID already contains an underscore). 
    If the 'bgz' modifier is added, the VCF file is block-gzipped. (Gzipping 
of other --recode output files is not currently supported.) 
    The A2 allele is saved as the reference and normally flagged as not 
based on a real reference genome ('PR' INFO field value). When it is 
important for reference alleles to be correct, you'll usually also want to 
include --a2-allele and --real-ref-alleles in your command. 
0

EPACTS既需要一个VCF和PED文件作为我输入关联分析。与PLINK documentation中描述的PED文件不同,EPACTS中使用的PED文件不包含基因型数据。它的目的是保存你的表型数据和协变量,并且它需要一个扩展名以被EPACTS识别。

要将R中的数据帧导出为PED文件,您只需指定需要.ped扩展名;您可以使用以下命令:

write.table(df, filename.ped, sep="\t", row.names=F, col.names=T, quote=F) 

EPACTS还要求包含列名的标题行被注释掉。我通常只是手动执行此步骤,因为在'#'中添加非常快,并且始终打开我的文件来检查它。或者,您可以设置col.names = F并使用一个.dat文件,如EPACTS文档中所示:https://genome.sph.umich.edu/wiki/EPACTS#PED_file_for_Phenotypes_and_Covariates