如何将通配符参数传递给sfile文件中的perl脚本？

我尝试在能够运行自定义perl脚本的Snakefile中编写规则。有两个输入文件和一个输出文件。输入文件和输出文件都有通配符，因为我想为各种文件运行脚本。但是当我为了生成不同的输入和输出文件而扩展时，perl脚本将所有可能的输入文件作为输入，而我希望它们一个接一个地输入。我应该怎么做才能让perl逐个“吃”输入文件？这是我的代码：如何将通配符参数传递给sfile文件中的perl脚本？

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"] 
SUPERGROUPS=["2supergroups","5supergroups"] 

rule add_supergroups: 
    input: 
     newick=expand("data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip",domain=DOMAINS, supergroup=SUPERGROUPS), 
     sup="data/species.v3.1.1.supergroups.txt" 
    output: 
     expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups", domain=DOMAINS, supergroup=SUPERGROUPS) 
    shell: 
     "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

来源

2017-03-16 lvw

为什么你的规则要运行所有文件的原因很简单：函数扩大（）。

就像你似乎知道的那样，展开使得Python字符串列表对于管理Snakemake中的文件非常有用。

但在你的榜样，规则要与{input.newick}文件的列表和{input.sup}一个文件运行perl脚本生成的文件列表作为输出。

您可以通过不使用expand function on the input and output轻松解决您的问题。

但Snakemake如何认识到他必须生成所有文件？通过前你rule add_supergroups这将作为输入的扩展rule add_supergroups的创建规则目标。

让我们做一些代码：

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"] SUPERGROUPS=["2supergroups","5supergroups"] rule target : input : expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups", domain=DOMAINS, supergroup=SUPERGROUPS) rule add_supergroups: input: newick="data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip", sup="data/species.v3.1.1.supergroups.txt" output: "results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups" shell: "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

现在，它应该工作。 Snakemake需要target rule的文件列表。他搜索所有规则以查找是否可以生成这些文件。

在这种情况下，他认识的output add_supergroups的pattern filename。所以他会自动完成DOMAINS和SUPERGROUPS的wilcards。规则add_supergroups将被逐个文件运行。

来源

2017-03-16 16:13:27

呵呵，你比我明明;-) – rioualen

我想有两个很好的答案，由两个新用户更快。保持。 :-) – simbabque

我们碰巧是法国“Snakemake社区”的一部分:) – rioualen

可以删除扩展（）函数和使用规定“所有”来定义你的目标。规则add_supergroups中通配符的值将自动从该目标文件中推断出来。

你甚至可以在规则“add_supergroups”使用通配符不同的名称Snakemake会识别和匹配的模式。

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"] 
SUPERGROUPS=["2supergroups","5supergroups"] 

rule all: 
    input: expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups" 

rule add_supergroups: 
    input: 
     newick="data/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip", 
     sup="data/species.v3.1.1.supergroups.txt" 
    output: 
     "results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups" 
    shell: 
     "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

从理论上讲，它甚至应该像这样工作：

DOMAINS= ["Metallophos", "PP2C", "Y_phosphatase"] 
SUPERGROUPS=["2supergroups","5supergroups"] 

rule all: 
    input: expand("results/{domain}/{supergroup}/RAxML_bipartitionsBranchLabels.bbhlist.txt.{domain}.fa.aligned.rp.me-25.id.phylip.supergroups" 

rule add_supergroups: 
    input: 
     newick="data/{foo}", 
     sup="data/species.v3.1.1.supergroups.txt" 
    output: 
     "results/{foo}.supergroups" 
    shell: 
     "perl scripts/change_newick.pl {input.sup} {input.newick} {output}"

来源

2017-03-16 17:37:29 rioualen

如何将通配符参数传递给sfile文件中的perl脚本？

回答

相关问题