使用ReGex来匹配表达式，Python

我有很多句子，但是我会创建一个函数来分别对每个句子进行操作。所以输入只是一个字符串。我的主要目标是提取在"near blue meadows"之类的介词后面的单词，我想要提取blue meadows。
我有我的所有介词在一个文本文件。它工作正常，但我想在使用正则表达式中有一个问题。这里是我的代码：进口重新使用ReGex来匹配表达式，Python

with open("Input.txt") as f: 
    words = "|".join(line.rstrip() for line in f) 
    pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words)) 
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station" 
    print(pattern.search(text3).group())

这将返回：

AttributeError       Traceback (most recent call last) 
<ipython-input-83-be0cdffb436b> in <module>() 
     5  pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words)) 
     6  text3 = "" 
----> 7  print(pattern.search(text3).group()) 

AttributeError: 'NoneType' object has no attribute 'group

的主要问题是用正则表达式，我的预期成果是“hennur警察”即2个字后不久。在我的代码中，我使用({})与preps列表匹配，\s后跟空格，(\d+\w+|\w+)后跟单词如19或hennur，\s\w+后跟一个空格和一个单词。我的正则表达式无法匹配，因此出现None错误。为什么它不起作用？

的Input.txt文件的内容：

['near','nr','opp','opposite','behind','towards','above','off']

预期输出：

hennur police

来源

2014-02-27 Hypothetical Ninja

你需要检查'words'中究竟是什么。 –

适用于我（尽管你实际上应该接近'hennur警察'），所以你确实需要仔细检查'Input.txt'是否正确（每行一个字）。 – Evert

input.txt的形式是['near'，'off'，'opposite'...]等等。我编辑了我的问题。核实。 –

该文件包含Python列表文字。使用ast.literal解析文字。

>>> import ast 
>>> ast.literal_eval("['near','nr','opp','opposite','behind','towards','above','off']") 
['near', 'nr', 'opp', 'opposite', 'behind', 'towards', 'above', 'off']

import ast 
import re 

with open("Input.txt") as f: 
    words = '|'.join(ast.literal_eval(f.read())) 
    pattern = re.compile('(?:{})\s(\d*\w+\s\w+)'.format(words)) 
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station" 

    # If there could be multiple matches, use `findall` or `finditer` 
    # `findall` returns a list of list if there's capturing group instead of 
    # entire matched string. 
    for place in pattern.findall(text3): 
     print(place) 

    # If you want to get only the first match, use `search`. 
    # You need to use `group(1)` to get only group 1. 
    print pattern.search(text3).group(1)

输出（第一行是for环印刷，第二个来自search(..).group(1)）：

hennur police 
hennur police

注意需要re.escape每个字，如果有正则表达式中具有特殊含义的单词中的任何特殊字符。

来源

2014-02-27 07:42:11 falsetru

它的工作.. thanx @falsetru –

@剑，修改你的问题一点点来说清楚。 – falsetru

使用ReGex来匹配表达式，Python

回答

相关问题