2017-04-14 72 views
1

我是相当新的Linux和Perl程序。我已经用尽所有搜索选项而没有答案。 我有一个主文件“master.txt”,其中包含已知的2列的所有已知交互,其中已知在同一行上的项目进行交互。我有一个项目列表“list.txt”,如果它包含在第1列和第2列中,我希望它们成为从主文件返回结果的搜索条件。所有文件都是制表符分隔的。对于如: 如果这是主文件: “master.txt”匹配字符串在另一个文件

AppleP001 BallP002 
AppleP002 CatP001 
BallP001 DogP001 
BallP002 AppleP001 
CatP001 AppleP002 
DogP001 BallP001 
DogP002 ZebraP001 
ElephantP001 CardinalP001 
FishP001 AntelopeP001 

而这种搜索文件: “LIST.TXT”

Apple 
Ball 
Cat 
Dog 

生成的文件应该只包含苹果*,球*,*猫与狗*上都列,但删除重复项:

我试着用grep:

grep -f list.txt master.txt > Sub_list.txt 

但我得到这个:

AppleP001  BallP002 
AppleP002  CatP001 
BallP001  DogP001 
BallP002  AppleP001 
CatP001 AppleP002 
DogP001 BallP001 
DogP002 ZebraP001 

如何删除重复(认为它是重复的,如果这两个项目都在同一行,不要紧,他们是在什么样的列),并从输出文件中删除无关的数据和得到这个?

AppleP001 BallP002 
AppleP002 CatP001 
BallP001 DogP001 

任何帮助,非常感谢!谢谢。

+0

欢迎SO。请注意,这里的问题预计将涉及_your code_的细节(以及它失败的原因)。查看[Help pages](帮助页面)(http://stackoverflow.com/help),它们简短且内容丰富。 – zdim

回答

1

有点重,如果文件非常大,但没有提到的问题,我的问题描述

 
AppleP001 BallP002 
AppleP002 CatP001 
BallP001 DogP001 

的理解

use warnings; 
use strict; 
use feature 'say'; 
use Path::Tiny; 
use List::Util qw(uniq any all); 

my ($file, $flist) = ('master.txt', 'list.txt'); 

my @search = path($flist)->lines({ chomp => 1 }); 

# Sort words within each line so then filter out duplicate lines 
my @filtered = uniq map { join ' ', sort split } path($file)->lines; 

# Each word on the line needs to match a word in @search list 
my @result = grep { all { found($_, \@search) } split } @filtered; 

say for @result; 

sub found { return any { $_[0] =~ /^$_/ } @{$_[1]} } 

输出同意如果你不能有Path::Tiny由于某种原因提供path,打开的文件和检查,并代替path(...)->lines读取文件句柄(所以在列表环境),并做chomp @search;


的最后一部分,写了一点

# Each word on the line needs to match a word in @search list 
my @result = grep { 
    my ($w1, $w2) = split; 
    any { $w1 =~ /^$_/ } @search and any { $w2 =~ /^$_/ } @search; 
} @filtered; 
+0

谢谢zdim!评论是非常有用的,并帮助我学习你的代码。 –

0

这是一个在AWK:

$ awk ' 
NR==FNR { a[$1]; next } # read list and hash to a 
{       # process master 
    b=""     # reset buffer 
    for(i in a)   # iterate thru a 
     if(index($0,i)) { # if list item is found in current master record 
      b=$0   # set the record to buffer 
      delete a[i] # remove list entry from a 
     } 
     if(b) print b  # print b 
}' list master    # mind the order 
AppleP001 BallP002 
AppleP002 CatP001 
BallP001 DogP001 
+0

谢谢詹姆斯!简短而甜美的代码,效果很好。 –

相关问题