我有一堆句子,我需要解析并转换为相应的正则表达式搜索代码。我的句子的例子 -使用Pyparsing访问已分析的元素
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
- 这意味着该行,phrase one
来 phrase2
和phrase3
在什么地方。此外,该行必须以Therefore we
LINE_CONTAINS abc {upto 4 words} xyz {upto 3 words} pqr
- 这意味着开始我需要允许高达第2个短语之间的4个字和最后2个短语
保罗麦圭尔使用帮助之间 高达3个字(here),下面的语法被写 -
from pyparsing import (CaselessKeyword, Word, alphanums, nums, MatchFirst, quotedString,
infixNotation, Combine, opAssoc, Suppress, pyparsing_common, Group, OneOrMore, ZeroOrMore)
LINE_CONTAINS, LINE_STARTSWITH = map(CaselessKeyword,
"""LINE_CONTAINS LINE_STARTSWITH """.split())
NOT, AND, OR = map(CaselessKeyword, "NOT AND OR".split())
BEFORE, AFTER, JOIN = map(CaselessKeyword, "BEFORE AFTER JOIN".split())
lpar=Suppress('{')
rpar=Suppress('}')
keyword = MatchFirst([LINE_CONTAINS, LINE_STARTSWITH, LINE_ENDSWITH, NOT, AND, OR,
BEFORE, AFTER, JOIN]) # declaring all keywords and assigning order for all further use
phrase_word = ~keyword + (Word(alphanums + '_'))
upto_N_words = Group(lpar + 'upto' + pyparsing_common.integer('numberofwords') + 'words' + rpar)
phrase_term = Group(OneOrMore(phrase_word) + ZeroOrMore((upto_N_words) + OneOrMore(phrase_word))
phrase_expr = infixNotation(phrase_term,
[
((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT,), # (opExpr, numTerms, rightLeftAssoc, parseAction)
(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
],
lpar=Suppress('{'), rpar=Suppress('}')
) # structure of a single phrase with its operators
line_term = Group((LINE_CONTAINS | LINE_STARTSWITH | LINE_ENDSWITH)("line_directive") +
Group(phrase_expr)("phrase")) # basically giving structure to a single sub-rule having line-term and phrase
line_contents_expr = infixNotation(line_term,
[(NOT, 1, opAssoc.RIGHT,),
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT),
]
) # grammar for the entire rule/sentence
sample1 = """
LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH Therefore we
"""
sample2 = """
LINE_CONTAINS abcd {upto 4 words} xyzw {upto 3 words} pqrs BEFORE something else
"""
我现在的问题是 - 如何访问解析的元素,以便将句子转换为我的正则表达式代码。对于这一点,我尝试以下 -
parsed = line_contents_expr.parseString(sample1)/(sample2)
print (parsed[0].asDict())
print (parsed)
pprint.pprint(parsed)
为sample1
上面的代码的结果是 -
{}
[[['LINE_CONTAINS', [[['sentence', 'one'], 'BEFORE', [['sentence2'], 'AND', ['sentence3']]]]], 'AND', ['LINE_STARTSWITH', [['Therefore', 'we']]]]]
([([(['LINE_CONTAINS', ([([(['sentence', 'one'], {}), 'BEFORE', ([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {})], {'phrase': [(([([(['sentence', 'one'], {}), 'BEFORE', ([(['sentence2'], {}), 'AND', (['sentence3'], {})], {})], {})], {}), 1)], 'line_directive': [('LINE_CONTAINS', 0)]}), 'AND', (['LINE_STARTSWITH', ([(['Therefore', 'we'], {})], {})], {'phrase': [(([(['Therefore', 'we'], {})], {}), 1)], 'line_directive': [('LINE_STARTSWITH', 0)]})], {})], {})
为sample2
上面的代码的结果是 -
{'phrase': [[['abcd', {'numberofwords': 4}, 'xyzw', {'numberofwords': 3}, 'pqrs'], 'BEFORE', ['something', 'else']]], 'line_directive': 'LINE_CONTAINS'}
[['LINE_CONTAINS', [[['abcd', ['upto', 4, 'words'], 'xyzw', ['upto', 3, 'words'], 'pqrs'], 'BEFORE', ['something', 'else']]]]]
([(['LINE_CONTAINS', ([([(['abcd', (['upto', 4, 'words'], {'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'], {'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something', 'else'], {})], {})], {})], {'phrase': [(([([(['abcd', (['upto', 4, 'words'], {'numberofwords': [(4, 1)]}), 'xyzw', (['upto', 3, 'words'], {'numberofwords': [(3, 1)]}), 'pqrs'], {}), 'BEFORE', (['something', 'else'], {})], {})], {}), 1)], 'line_directive': [('LINE_CONTAINS', 0)]})], {})
我基于上述输出的问题是 -
- 为什么pprint(漂亮打印)比普通打印具有更详细的解析?
- 为什么
asDict()
方法不给sample1
输出,但为sample2
输出? - 无论何时我尝试使用
print (parsed.numberofwords)
或parsed.line_directive
或parsed.line_term
访问解析元素,它都不会提供任何内容。我如何访问这些元素,以便使用它们来构建我的正则表达式代码?
Paul,您是否建议我从'results.dump()'字符串的内容中操作以便处理元素以便进一步工作? – user1993
不,绝对不是!我只是简单地指导你使用'results.dump()'来显示'results'的内容。您应该能够在遍历列表时直接遍历'results',并且可以使用字典或对象语法按名称引用字段。 'dump()'输出应该指导您使用哪种模式以及何时使用。 – PaulMcG
Paul,你提到我可以如何使用'dump'和'runTests'轻松地显示解析结果。但正如我在问题中提到的那样,我试图访问解析的元素来操纵它们成为正则表达式。我的问题3具体。你如何建议我为解析的行访问'numberofwords','line_term'等东西? – user1993