如果多个空格不在引号之间出现，请用单个空格替换多个空格？

我有一个用例，我想用一个空格替换多个空格，除非它们出现在引号内。例如如果多个空格不在引号之间出现，请用单个空格替换多个空格？

原始

this is the first a b c 
this is the second "a  b  c"

this is the first a b c 
this is the second "a  b  c"

后，我相信一个正则表达式应该能够做的伎俩，但我没有与他们太多的经验。下面是一些代码，我已经有

import re 

str = 'this is the second "a  b  c"' 
# Replace all multiple spaces with single space 
print re.sub('\s\s+', '\s', str) 

# Doesn't work, but something like this 
print re.sub('[\"]^.*\s\s+.*[\"]^, '\s', str)

我明白了为什么我的第二个以上不工作，所以只是想一些替代方法。如果可能的话，你能解释一下你的regex解决方案的一些部分吗？由于

来源

2013-03-20 Shane

你有这样的事情：'asdasdasd“asdasdasd ____ asdajskd”'（'_'代表空格）。你只用空间工作，还是你也想处理新的行？ – nhahtdh 2013-03-20 17:06:29

是的。里面的引号可以是任何东西，它应该被忽略 – Shane 2013-03-20 17:09:39

'里面的引号可以是任何东西'它可以包含新行吗？ – nhahtdh 2013-03-20 17:11:59

的"substring"

import re 
str = 'a b c "d e f"' 
str = re.sub(r'("[^"]*")|[ \t]+', lambda m: m.group(1) if m.group(1) else ' ', str) 

print(str) 
#'a b c "d e f"'

正则表达式("[^"]*")|[ \t]+中假设没有"将匹配带引号的字符串或一个或多个单空格或制表符。由于正则表达式首先匹配带引号的子字符串，因此它内部的空白字符将无法与替代子模式[ \t]+匹配，因此将被忽略。

与引用的子字符串匹配的模式包含在()中，因此回调可以检查它是否匹配。如果是这样，m.group(1)将是truthy，它的价值只是返回。如果不是，则匹配空白，因此单个空间作为替换值返回。

没有LAMDA

def repl(match): 
    quoted = match.group(1) 
    return quoted if quoted else ' ' 

str = re.sub(r'("[^"]*")|[ \t]+', repl, str)

来源

2013-03-20 17:51:50 MikeM

如果你想一个解决方案，可靠的每一次努力，无论输入或其他警告如不使嵌入式报价，那么你要编写一个简单的解析器不使用正则表达式或者用引号分割。

def parse(s): 
    last = '' 
    result = '' 
    toggle = 0 
    for c in s: 
     if c == '"' and last != '\\': 
      toggle ^= 1 
     if c == ' ' and toggle == 0 and last == ' ': 
      continue 
     result += c 
     last = c 
    return result 

test = r'" < >"test 1 2 3 "a \"< >\" b c"' 
print test 
print parse(test)

来源

2013-03-20 18:55:24

如果多个空格不在引号之间出现，请用单个空格替换多个空格？

回答

相关问题