把.txt文件放到一个散列中，并与使用perl的一个单词数组进行比较

-3

我有一个.txt文件的文件夹，我想存储在一个散列中。然后将该文件与特定单词的数组进行比较。计算特定字词出现的次数。把.txt文件放到一个散列中，并与使用perl的一个单词数组进行比较

来源

2010-11-16 jenniem

什么是文件内容和比较哈希具体的例子吗？你试过什么了？这不是一个“请为我做作业”的网站，这个问题看起来很像。 – DVK 2010-11-16 12:13:38

你的问题到底是什么？ :) – 2010-11-16 12:16:27

@ØyvindSkaar - 纯粹与Perl的好奇心无关，如果你不介意的话，一个人如何恰当地发音你的名字？ – DVK 2010-11-16 12:17:31

不打算写的代码给你，但你可以这样做：用

循环中的所有文件（见水珠（））
循环中的所有单词中的每个文件（可能正则表达式或split（）？）
检查每个单词是否需要单词的散列。如果它的存在，增加一个“反”哈希值这样：$哈希{$字} ++ 或者你可以存储所有词语的哈希，然后抓住你想要事后的人..

OR ...有很多方法可以做到这一点..

如果你的文件是巨大的，你必须做的另一种方式

来源

2010-11-16 12:43:43

我的文件很小，所以应该工作...谢谢 – jenniem 2010-11-17 09:55:59

请注意，我用\p{Alpha}因为从技术上定义了一个字。你可以使用正则表达式来添加数字，或者确保在开始时有一个字母或任何你可能需要的字母。

还要注意，对于由每行一个字组成的文件，正则表达式是矫枉过正的，你应该省略它。只需chomp该行和商店$_。

use 5.010; # for say 
use strict; 
use warnings; 

my (%hash); 

sub load_words { 
    @hash{ @_ } = (0) x @_; return; 
} 

sub count_words { 
    $hash{$_}++ foreach grep { exists $hash{$_} } @_; 
} 


my $word_regex 
    = qr{ (    # start a capture 
      \p{Alpha}+  # any sequence of one or more alpha characters 
      (?:   # begin grouping of 
       ['-]   # allow hyphenated words and contractions 
       \p{Alpha}+ # which must be followed by an alpha 
      )*    # any number of times 
      (?: (?<=s)')? # case for plural possessives (ht: tchrist) 
     )    # end capture 
     }x; 

# load @ARGV to do <> processing 
@ARGV = qw(list of files I take words from); 
while (<>) { 
    load_words(m/$word_regex/g); 
} 
@ARGV = qw(list of files where I count words); 
while (<>) { 
    count_words(m/$word_regex/g); 
} 

# take a look at the hash 
say Data::Dumper->Dump([ \%hash ], [ '*hash' ]);

来源

2010-11-16 14:57:06 Axeman

请参阅[这个答案]（http://stackoverflow.com/questions/4213800/is-there-something-like-a-counter-variable-in-regular-表达替换/ 4214173＃4214173）为另一种基于单词的方法，看看某些边界情况。 – tchrist 2010-11-18 13:07:17

@tchrist：关于复数所有物的好处。：D – Axeman 2010-11-18 15:23:50

我真的很高兴看到人们开始摆脱他们模式中的书写[a-z]。就像* so * 20世纪60年代！ ☹ – tchrist 2010-11-18 16:09:22

，所以我就做它用的我想找到特定单词的数组...快乐的日子:-)

#!/usr/bin/perl 
#use strict; 
use warnings; 
my @words; 

my @triggers=(" [kK]ill"," [Aa]ssault", " [rR]ap[ie]"," [dD]rug"); 
my %hash; 

sub count_words { 
    print "\n"; 
} 

my $word_regex 
    = qr{ (    # start a capture 
      \p{Alpha}+  # any sequence of one or more alpha characters 
      (?:   # begin grouping of 
       ['-]   # allow hyphenated words and contractions 
       \p{Alpha}+ # which must be followed by an alpha 
      )*    # any number of times 
     )    # end capture 
     }x; 

my @files; 
my $dirname = "/home/directory"; 
opendir(DIR,$dirname) or die "can't opendir $dirname: $!"; 
while (defined($file = readdir(DIR))) { 
    push @files, "$dirname$file"; 
} # do something with "$dirname/$file" } 
closedir(DIR); 
my @interestingfiles; 

foreach $file (@files){ 

    open FILE, ("<$file") or die "No file"; 

    foreach $line (<FILE>){ 
     foreach $trigger (@triggers){ 
      if($line =~ /$trigger/g){ 
       push @interestingfiles, "$file\n"; 
      } 
     } 
    } 
    close FILE; 
} 
print @interestingfiles;

来源

2010-11-18 12:16:06 jenniem

为什么你评论'use strict;'？你应该*永远不*这样做。解决它所揭示的问题。 – Ether 2010-11-21 17:22:46

把.txt文件放到一个散列中，并与使用perl的一个单词数组进行比较

回答

相关问题