Ruby扫描以匹配多个词

我有一个代码解析文件夹中的文本文件，并将文本保存在某个搜索词的周围。Ruby扫描以匹配多个词

但是，我无法编辑代码，因此它可以同时处理多个单词。我不想循环整个代码，因为我想为每个文本文件分组结果，而不是为每个搜索词分组。

使用all_documents.scan("(word1|word2|word3)")或类似的正则表达式变体似乎不起作用。

#helper 
     def indices text, index, word 
     padding = 20 
     bottom_i = index - padding < 0 ? 0 : index - padding 
     top_i = index + word.length + padding > text.length ? text.length : index +   word.length + padding 
     return bottom_i, top_i 
    end 

    #script 
    base_text = File.open("base.txt", 'w') 
    Dir::mkdir("summaries") unless File.exists?("summaries") 
    Dir.chdir("summaries") 

    Dir.glob("*.txt").each do |textfile| 
     whole_file = File.open(textfile, 'r').read 
     puts "Currently summarizing " + textfile + "..." 
     curr_i = 0 
     str = nil 
     whole_file.scan(/trail/).each do |match| 
      if i_match = whole_file.index(match, curr_i) 
      top_bottom = indices(whole_file, i_match, match) 
      base_text.puts(whole_file[top_bottom[0]..top_bottom[1]] + " : " +   File.path(textfile)) 
      curr_i += i_match      
      end 
     end 
     puts "Done summarizing " + textfile + "." 
    end 
    base_text.close

任何想法？

来源

2013-03-14 Seeb

您可以使用Regexp.union()。它确实是你想要的。

在您的代码，它将成为

... 
whole_file.scan(Regexp.union(/trail/, /word1/, /word2/, /word3/)).each do |match| 
...

来源

2013-03-14 22:35:51 toch

完美。有用。谢谢！ – Seeb 2013-03-18 09:26:39

我想你的任何一个字（例如通过/[\w']+/）更好scan和scan块内，检查是否$&匹配任何特定的话。如果scan碰巧与您不感兴趣的词匹配，那么没有错;只是忽略它。

来源

2013-03-14 22:40:21 sawa

您可以使用Regexp.union，但那只是生成子字符串匹配。如果你想匹配完整的单词，你需要做更多的工作。我会用：

/\b(?:#{ Regexp.union(%w[trail word1 word2 word3]).source })\b/ 
=> /\b(?:trail|word1|word2|word3)\b/

所得图案将定位整词，忽略任何子串：

foo = /\b(?:#{ Regexp.union(%w[trail word1 word2 word3]).source })\b/ 
# /\b(?:trail|word1|word2|word3)\b/ 

words = %w[trail word1 word2 word3] 
words.join(' ').scan(foo) 
# [ 
#  [0] "trail", 
#  [1] "word1", 
#  [2] "word2", 
#  [3] "word3" 
# ] 

words.join.scan(foo) 
# [] 

'trail word1word2 word3'.scan(foo) 
# [ 
#  [0] "trail", 
#  [1] "word3" 
# ]

来源

2013-03-15 03:44:27

Ruby扫描以匹配多个词

回答

相关问题