2016-07-25 174 views
-1

我在搜索一个文件中的关键字列表。我可以匹配整个关键字,但对于某些关键字,我需要匹配单词的第一部分。例Perl使用正则表达式搜索多个关键字

DES 
AES 
https:// --- here it should match the word starting with https:// but my code considers the whole word and skips it. 

例如使用上述关键字,我会想匹配DESDEShttps://只能从下面输入:

DES some more words 
DESTINY and more... 
https://example.domain.com 
http://anotherexample.domain.com # note that this line begins with http://, not https:// 

这是我到目前为止已经试过:

use warnings; 
use strict; 

open STDOUT, '>>', "my_stdout_file.txt"; 
#die qq[Usage: perl $0 <keyword-file> <search-file> <file-name>\n] unless @ARGV == 3; 

my $filename = $ARGV[2]; 
chomp ($filename); 
open my $fh, q[<], shift or die $!; --- This file handle Opening all the 3 arguments. I need to Open only 2. 

my %keyword = map { chomp; $_ => 1 } <$fh>; 
print "$fh\n"; 
while (<>) { 
    chomp; 
    my @words = split; 
    for (my $i = 0; $i <= $#words; $i++) { 
      if ($keyword{^$words[ $i ] }) { 
        print "Keyword Found for file:$filename\n"; 
        printf qq[$filename Line: %4d\tWord position: %4d\tKeyword: %s\n], 
          $., $i, $words[ $i ]; 
      } 
    } 
} 
close ($fh); 
+0

程序如何知道你是否希望有一个完整的词匹配或只是部分匹配? – Borodin

回答

0

下面是我认为你试图实现的工作解决方案。让我知道如果不是:

use warnings; 
use strict; 
use feature qw/ say /; 

my %keywords; 

while(<DATA>){ 
    chomp; 
    my ($key) = split; 
    my $length = length($key); 
    $keywords{$key} = $length; 
} 

open my $in, '<', 'in.txt' or die $!; 


while(<$in>){ 
    chomp; 
    my $firstword = (split)[0]; 

     for my $key (keys %keywords){ 
      if ($firstword =~ m/$key/){ 
       my $word = substr($firstword, 0, $keywords{$key}); 
       say $word; 
      } 
     } 
} 
__DATA__ 
Keywords:- 
DES 
AES 
https:// - here it should match the word starting with https:// but my code considers the whole word and skipping it. 

对于包含输入文件:

here are some words over multiple 
lines 
that may or 
may not match your keywords: 
DES DEA AES SSE 
FOO https: 
https://example.domain.com 

这将产生输出:

DES 
https:// 
+0

运行代码时出现错误。 **在模式匹配中使用未初始化的值(m //)** – John

+0

@John - 您是否设法排除此问题? – fugu

+0

是的,我做到了。谢谢 – John