2016-08-30 48 views
1

bash下面的I loop通过一个目录并在所有.txt文件上运行grep。我试图做的是在过滤结果中包含每个文件的标题行。目前,标题显示在“stdout”中,并且两个新的过滤文件不带标题。下面看起来很接近,但我似乎无法在输出中包含独特的标题。谢谢你:)。bash to grep文件的匹配但包含唯一标题行

的bash

for file in /home/cmccabe/compare/*.txt ; do 
bname=$(basename $file) 
pref=${bname%%.txt} 
[ "$file" = /home/cmccabe/compare/${pref}_filtered.txt ] && continue 
head -n 1 "$file" 
grep -wFf /home/cmccabe/compare/list $file > /home/cmccabe/compare/${pref}_filtered.txt 
done 

file1的

Index Chromosomal Position Gene  
4 43394661 SLC2A1 
22 166870221 SCN1A 
22 166870952 CBS 

file2的

Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 
chr5 178555085 FBN1 AMPL4306766155 

列表(用于grep

SLC2A1 
SCN1A 
ADSL 
ALDH7A1 

期望file1_filtered输出

Index Chromosomal Position Gene 
4 43394661 SLC2A1 
22 166870221 SCN1A 

期望file2_filtered输出

Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 

回答

2

随着GNU grep和bash的过程替代:

grep -wf <(head -n 1 file1; cat list) file1 

输出:

 
Index Chromosomal Position Gene  
4 43394661 SLC2A1 
22 166870221 SCN1A 

grep -wf <(head -n 1 file2; cat list) file2 

输出:

 
Chrom Position Gene Symbol Target ID 
chr22 40742831 ADSL AMPL3764590328 
chr22 40745898 ADSL AMPL5177720331 
chr5 125885803 ALDH7A1 AMPL4306766150 
+0

或无进程替换:'head -n 1 file1; grep -wf list file1' – Cyrus

1

你要对这个错误的。阅读why-is-using-a-shell-loop-to-process-text-considered-bad-practice然后只是这样做:

awk ' 
BEGIN { FS="\t" } 
NR==FNR { genes[$0]; next } 
FNR==1 { 
    close(out) 
    out = FILENAME 
    sub(/\.txt$/,"_filtered&",out) 
    for (i=1; i<=NF; i++) { 
     if ($i == "Gene") { 
      g = i 
     } 
    } 
} 
(FNR==1) || ($g in genes) { print > out } 
' /home/cmccabe/compare/*.txt 

这将是比目前你正在做什么更稳健,高效,便于携带。