我创建了一个函数来搜索文本中给定单词(w)的上下文,其中左侧和右侧是用于记录单词数灵活性的参数。使用正则表达式查找单词上下文
import re
def get_context (text, w, left, right):
text.insert (0, "*START*")
text.append ("*END*")
all_contexts = []
for i in range(len(text)):
if re.match(w,text[i], 0):
if i < left:
context_left = text[:i]
else:
context_left = text[i-left:i]
if len(text) < (i+right):
context_right = text[i:]
else:
context_right = text[i:(i+right+1)]
context = context_left + context_right
all_contexts.append(context)
return all_contexts
因此,例如,如果一个具有在像这样的列表的形式的文本:
文本= [ '的Python', '是', '动态', '类型','语言','Python', 'functions','really','care','about','what','you','pass','to', 'them','but','你','有','它','','错','方式','如果','你','想','到','通','一','千' ','arguments','to','your', 'function','then','you','can','explicit','define','every', 'parameter','in ','你的','功能','定义','和','你的', '功能','将','是','自动','能','到','处理', 'all',' ”, '参数', '你', '通', '到', '他们', '对', '你']
的功能,例如工作正常:
get_context(text, "function",2,2)
[['language', 'python', 'functions', 'really', 'care'], ['to', 'your', 'function', 'then', 'you'], ['in', 'your', 'function', 'definition', 'and'], ['and', 'your', 'function', 'will', 'be']]
现在我想建立的每一个字的文本上下文的字典执行以下操作:
d = {}
for w in set(text):
d[w] = get_context(text,w,2,2)
但我正在逐渐这个错误。
Traceback (most recent call last):
File "<pyshell#32>", line 2, in <module>
d[w] = get_context(text,w,2,2)
File "<pyshell#20>", line 9, in get_context
if re.match(w,text[i], 0):
File "/usr/lib/python3.4/re.py", line 160, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python3.4/re.py", line 294, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.4/sre_compile.py", line 568, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.4/sre_parse.py", line 760, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.4/sre_parse.py", line 370, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.4/sre_parse.py", line 579, in _parse
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
我不明白这个错误。谁能帮我这个?
好吧,这就是问题所在。我没有想到这两个* START *和* END *。我想到了==文本[我],但我想知道为什么这不起作用。谢谢 – Wunter