的Python：正则表达式匹配里面

我与Python和正则表达式的工作括号（还有其他括号）任何东西，我想变换像下面的字符串：的Python：正则表达式匹配里面

(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"',2911510,NULL,NULL,NULL),

到像下面的列表：

[ 
    [1694439, 805577453641105408, '\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"', 2887640, NULL, NULL, NULL], 
    [1649240, 805577446758158336, '\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"', 2911510, NULL, NULL, NULL] 
]

这里的主要问题在于，正如您所看到的，文本内部还有一些圆括号，我不想分割。我已经尝试过\([^)]+\)之类的东西，但很明显，它会在第一个)发现它。

任何线索如何解决这个问题？

来源

2017-06-19 ParKein

这不是正则表达式的设计目的。虽然有扩展名允许平衡括号，但没有这些扩展名，*抽象引理*指定了一个正则表达式不能这样做。 –

正则表达式无法计数，因此它们无法使用匹配的引号和括号提取文本。你需要一个解析器。请参阅PLY，PyParsing，Lark等。 – phd

尽管正如@WillemVanOnsem所述，正则表达式并不是为此设计的，但如果您知道第一列始终是一堆数字，您可以将其用作定位点。看看[正则表达式Lookbehind]（http://www.rexegg.com/regex-lookarounds.html） – EndermanAPM

这是您要查找的输出吗？

big = """(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"',2911510,NULL,NULL,NULL),""" 
small = big.split('),') 
print(small)

我在做什么是),分裂然后，只需通过循环和分裂像正常的逗号。我将表明当然可以优化的基本方法：

new_list = [] 

for x in small: 
    new_list.append(x.split(',')) 
print(new_list)

现在这样做的缺点是，有一个空的列表，但你以后可以将其删除。

来源

2017-06-19 15:24:35 MattR

您的解决方案唯一的问题是，可能会出现一种情况，其中括号内的字符串有“），”..无论如何，我找到了一个网站，正是我所需要的http://www.csvjson.com/sql2json – ParKein

这里是一个简单的正则表达式的解决方案，在不同的组捕捉每个逗号分隔值：

\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)

用法：

input_string = r"""(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"',2911510,NULL,NULL,NULL),""" 

import re 
result = re.findall(r"\(([^,]*),([^,]*),'((?:\\.|[^'])*)',([^,]*),([^,]*),([^,]*),([^)]*)", input_string)

来源

2017-06-19 15:49:11

嵌套括号在这里是没有问题的，因为它们是引号引起来。所有你需要做的是分别匹配报价部分：

import re 

pat = re.compile(r"[^()',]+|'[^'\\]*(?:\\.[^'\\]*)*'|(\()|(\))", re.DOTALL) 

s = r'''(1694439,805577453641105408,'\"@Bessemerband not reverse gear simply pointing out that I didn\'t say what you claim I said. I will absolutely riot if (Brexit) is blocked.\"',2887640,NULL,NULL,NULL),(1649240,805577446758158336,'\"Ugh FFS the people you use to look up to fail to use critical thinking. Smh. He did the same thing with brexit :(\"',2911510,NULL,NULL,NULL),''' 

result = [] 

for m in pat.finditer(s): 
    if m.group(1): 
     tmplst = [] 
    elif m.group(2): 
     result.append(tmplst)   
    else: 
     tmplst.append(m.group(0)) 

print(result)

如果你的字符串也可以包含括号不括引号之间，你可以使用与regex module一个递归模式解决问题（用它和CSV模块是个好主意）或建立一个状态机。

来源

2017-06-19 16:17:12

的Python：正则表达式匹配里面

回答

相关问题