Pyparsing：解析括号

我试图解析以下行独特的最长匹配：Pyparsing：解析括号

command(grep -o '(' file.txt) 
command(ls -1)

与pyparsing。这些命令不会延伸到多行。该规则的初步设想是

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables)) + ")"

但由于pp.printables还包含（且应包含）右括号“）” pyparsing无法解析命令。我可以强制pyparsing匹配最长的命令字符串，以便后面紧跟一个右括号吗？

来源

2017-09-01 DangerRanger

看你的问题，我首先创建包含您的示例文本，分析器，并runTests呼叫的小脚本：

import pyparsing as pp 

tests = """\ 
    command(grep -o '(' file.txt) 
    command(ls -1) 
    """ 

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables)) + ")" 
cmd.runTests(tests)

正如你所说，自终止失败“）”得到inncluded在OneOrMore reptetition：（runTests在这里有用，因为它要么显示分析结果，或摆了个标记，语法分析器误入歧途）

command(grep -o '(' file.txt) 
          ^
FAIL: Expected ")" (at char 29), (line:1, col:30) 

command(ls -1) 
      ^
FAIL: Expected ")" (at char 14), (line:1, col:15)

发生这种情况是因为pyparsing纯粹是从左到右，没有隐含的前瞻。

最简单直接的解决方法是从一套printables的是你的话可制成排除“）”：

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables, excludeChars=")")) + ")"

这一点让成功的输出：

command(grep -o '(' file.txt) 
['command(', 'grep', '-o', "'('", 'file.txt', ')'] 

command(ls -1) 
['command(', 'ls', '-1', ')']

但如果我一个不同的测试字符串添加到您的测试：

command(grep -o ')' file.txt)

的')'是错误的对于关闭右括号：

command(grep -o ')' file.txt) 
       ^
FAIL: Expected end of text (at char 18), (line:1, col:19)

通常包括的“读，直到X”多种pyparsing表达式时，我们需要确保引号内的X不被误解为实际X.要做到这一点

一种方式是通过寻找引号的字符串前匹配打印的话抢先比赛：

cmd = "command(" + pp.OneOrMore(pp.quotedString | 
           pp.Word(pp.printables, excludeChars=")")) + ")"

现在我们的报价右括号被正确地跨过作为引用字符串：

command(grep -o ')' file.txt) 
['command(', 'grep', '-o', "')'", 'file.txt', ')']

但仍有许多可能的极端情况，可能绊倒这个解析器，因此它可能是简单的使用pyparsing SkipTo表达：

cmd = "command(" + pp.SkipTo(")", ignore=pp.quotedString) + ")"

其运行测试为：

command(grep -o '(' file.txt) 
['command(', "grep -o '(' file.txt", ')'] 

command(ls -1) 
['command(', 'ls -1', ')'] 

command(grep -o ')' file.txt) 
['command(', "grep -o ')' file.txt", ')']

请注意，我们还必须明确地告诉SkipTo步骤在任何“）”字，可能是带引号的字符串内。另外，我们的命令参数的主体现在作为单个字符串返回。

如果您的命令主体本身可能包含括号内的值，那么我们仍然会对它们进行查询。看看这个测试：

command(grep -x '|'.join(['(', ')']) file.txt)

runTests再次向我们表明，我们已经被误导“）”，我们不想与结束：

command(grep -x '|'.join(['(', ')']) file.txt) 
            ^
FAIL: Expected end of text (at char 37), (line:1, col:38)

您可以添加一个超前的在“）”告诉SkipTo只匹配“）”这是正确的字符串结束前：

cmd = "command(" + pp.SkipTo(")" + pp.FollowedBy(pp.StringEnd()), 
          ignore=pp.quotedString) + ")"

但与此解析器，我们实际上已经恢复了，你可以用绳子做的一样好索引，分割和剥离方法消耗臭氧层物质。

最后一个版本，向您展示使用pyparsing的nestedExpr，这将帮助你在你的参数列表内嵌套的括号的情况：

cmd = "command" + pp.originalTextFor(pp.nestedExpr())

通常情况下，nestedExpr将返回解析内容的嵌套列表字符串列表，但通过用originalTextFor包装它，我们得到原始值。还要注意，我们删除了“（”从开“命令（”，因为nestedExpr将用它来解析其开括号，与这些结果：

command(grep -o '(' file.txt) 
['command', "(grep -o '(' file.txt)"] 

command(ls -1) 
['command', '(ls -1)'] 

command(grep -o ')' file.txt) 
['command', "(grep -o ')' file.txt)"] 

command(grep -x '|'.join(['(', ')']) file.txt) 
['command', "(grep -x '|'.join(['(', ')']) file.txt)"]

最终，该办法你把和解析器的复杂性你需要的将取决于你的解析器的目标，但是这些例子应该给你一些关于如何从这里扩展的想法。

来源

2017-09-02 15:35:01 PaulMcG

Pyparsing：解析括号

回答

相关问题