使用pyparsing来解析一个单词escape-split在多行

我试图解析可以使用pyparsing用反斜杠 - 换行符组合（“\\n”）拆分多行的单词。下面是我做了什么：使用pyparsing来解析一个单词escape-split在多行

from pyparsing import * 

continued_ending = Literal('\\') + lineEnd 
word = Word(alphas) 
split_word = word + Suppress(continued_ending) 
multi_line_word = Forward() 
multi_line_word << (word | (split_word + multi_line_word)) 

print multi_line_word.parseString(
'''super\\ 
cali\\ 
fragi\\ 
listic''')

我得到的输出是['super']，而预期输出为['super', 'cali', fragi', 'listic']。更妙的是所有的人都加入了一个单词（我认为我可以做multi_line_word.parseAction(lambda t: ''.join(t))。

我试着在pyparsing helper看着这个代码，但它给了我一个错误，maximum recursion depth exceeded。

编辑2009-11-15：我意识到晚些时候pyparsing变得有点慷慨的问候空白，并导致一些可怜的假设，我以为我是为解析了很多宽松也就是说，我们要看到的任何字，逸出的部分之间和EOL字符没有空白。

我意识到上面的小字符串作为测试用例是不够的，所以我写了下面的单元测试。可以通过这些测试应该能够匹配什么，我直觉地认为，作为一个逃生分割字—和仅逃生分割字码。他们不会匹配一个基本单词，而不是逃避分裂。我们可以—，我相信应该—使用不同的语法结构。这使得它们两个完全整齐。

import unittest 
import pyparsing 

# Assumes you named your module 'multiline.py' 
import multiline 

class MultiLineTests(unittest.TestCase): 

    def test_continued_ending(self): 

     case = '\\\n' 
     expected = ['\\', '\n'] 
     result = multiline.continued_ending.parseString(case).asList() 
     self.assertEqual(result, expected) 


    def test_continued_ending_space_between_parse_error(self): 

     case = '\\ \n' 
     self.assertRaises(
      pyparsing.ParseException, 
      multiline.continued_ending.parseString, 
      case 
     ) 


    def test_split_word(self): 

     cases = ('shiny\\', 'shiny\\\n', ' shiny\\') 
     expected = ['shiny'] 
     for case in cases: 
      result = multiline.split_word.parseString(case).asList() 
      self.assertEqual(result, expected) 


    def test_split_word_no_escape_parse_error(self): 

     case = 'shiny' 
     self.assertRaises(
      pyparsing.ParseException, 
      multiline.split_word.parseString, 
      case 
     ) 


    def test_split_word_space_parse_error(self): 

     cases = ('shiny \\', 'shiny\r\\', 'shiny\t\\', 'shiny\\ ') 
     for case in cases: 
      self.assertRaises(
       pyparsing.ParseException, 
       multiline.split_word.parseString, 
       case 
      ) 


    def test_multi_line_word(self): 

     cases = (
       'shiny\\', 
       'shi\\\nny', 
       'sh\\\ni\\\nny\\\n', 
       ' shi\\\nny\\', 
       'shi\\\nny ' 
       'shi\\\nny captain' 
     ) 
     expected = ['shiny'] 
     for case in cases: 
      result = multiline.multi_line_word.parseString(case).asList() 
      self.assertEqual(result, expected) 


    def test_multi_line_word_spaces_parse_error(self): 

     cases = (
       'shi \\\nny', 
       'shi\\ \nny', 
       'sh\\\n iny', 
       'shi\\\n\tny', 
     ) 
     for case in cases: 
      self.assertRaises(
       pyparsing.ParseException, 
       multiline.multi_line_word.parseString, 
       case 
      ) 


if __name__ == '__main__': 
    unittest.main()

来源

2009-11-14 gotgenes

闲逛的多一点之后，我来到this help thread那里有这明显的一点

我经常看到低效的语法时有人直接从BNF定义实现pyparsing语法。 BNF 不具有“一个或多个 ”概念或“零个或多个”或 “可选” ......

就这样，我得到了主意，改变这两条线

multi_line_word = Forward() 
multi_line_word << (word | (split_word + multi_line_word))

要

multi_line_word = ZeroOrMore(split_word) + word

这也得到了输出什么，我一直在寻找：['super', 'cali', fragi', 'listic']。

接下来，我添加了一个解析的行动，将参加这些令牌一起：

multi_line_word.setParseAction(lambda t: ''.join(t))

这给出了['supercalifragilistic']最终输出。

我学到的带回家的消息是一个不只是walk into Mordor。

只是在开玩笑。

的带回家的消息是，不能简单地实现BNF与pyparsing一个到一个翻译。应该调用一些使用迭代类型的技巧。

编辑2009-11-25：为了补偿更艰苦的测试案例，我修改了代码如下：

no_space = NotAny(White(' \t\r')) 
# make sure that the EOL immediately follows the escape backslash 
continued_ending = Literal('\\') + no_space + lineEnd 
word = Word(alphas) 
# make sure that the escape backslash immediately follows the word 
split_word = word + NotAny(White()) + Suppress(continued_ending) 
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word) 
multi_line_word.setParseAction(lambda t: ''.join(t))

这样做，使得没有任何空间来任意之间的利益的元素（除了反斜线后的换行符之外）。

来源

2009-11-15 04:10:18 gotgenes

使用'Combine'也不会强制介入空格。 – PaulMcG 2009-11-16 06:24:46

有趣。尝试过 'multi_line_word = Combine（Combine（OneOrMore（split_word））+ Optional（word））' 但是它在'sh \\\ n iny''情况下破坏了，因为它不会引发异常，而是返回'['sh']'。我错过了什么吗？ – gotgenes 2009-11-16 20:04:49

那么，你的话不仅仅是字母跨越一个'\' - 换行符，但在字母'i'之前那里有空格，这被视为分词符号，所以Combine在'sh'后面停下来。你*可以*修改与相邻的= False构造函数参数的组合，但要注意 - 你可能最终将整个文件作为一个单词吸引！或者，您可以重新定义您的continue_ending的定义，以便在lineEnd后包含任何空格，如果您还想折叠任何前导空格。 – PaulMcG 2009-11-17 01:56:25

您与您的代码非常接近。这些MODS的将工作：

# '|' means MatchFirst, so you had a left-recursive expression 
# reversing the order of the alternatives makes this work 
multi_line_word << ((split_word + multi_line_word) | word) 

# '^' means Or/MatchLongest, but beware using this inside a Forward 
multi_line_word << (word^(split_word + multi_line_word)) 

# an unusual use of delimitedList, but it works 
multi_line_word = delimitedList(word, continued_ending) 

# in place of your parse action, you can wrap in a Combine 
multi_line_word = Combine(delimitedList(word, continued_ending))

，正如你在pyparsing谷歌上搜索发现，BNF-> pyparsing翻译应该用特殊的视图进行到位的BNF，嗯，缺点使用pyparsing功能。实际上，我正在编写一个更长的答案，涉及更多的BNF翻译问题，但您已经找到了这个材料（在wiki上，我假设）。

来源

2009-11-15 16:51:08 PaulMcG

使用pyparsing来解析一个单词escape-split在多行

回答

相关问题