分裂串

我的文字像这样的字符串：分裂串

'tx cycle up.... down 
rx cycle up.... down 
phase:... 
rx on scan: 123456 
tx cycle up.... down 
rx cycle up.... down 
phase:... 
rx on scan: 789012 
setup 
tx cycle up.... down 
rx cycle up.... down 
tx cycle up.... down 
rx cycle up.... down'

我要了拆分此字符串成被分成这些块字符串列表：

['tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 123456', 
'tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 789012', 
'tx cycle up... down rx cycle up.... down', 
'tx cycle up... down rx cycle up.... down']

有时他们有一个'阶段'和'扫描'的数字，但有时他们没有，我需要这是足够普遍适用于任何这些情况下，将不得不这样做到大量的数据。

基本上，我想把它分成一个字符串列表，其中每个元素从'tx'到下一个'tx'（包括第一个'tx'，但不是该元素中的下一个）的扩展。我怎样才能做到这一点？

编辑：假设除了文本字符串上面我有一个看起来像这样的其他文字字符串：

'closeloop start 
closeloop ..up:677 down:098 
closeloop start 
closeloop ..up:568 down:123'

我的代码将通过每个文字和分裂它的串入名单与拆分代码。但是，当它到达这个文本字符串时，它不会找到任何要分割的东西 - 那么如何包含一个命令来在“closeloop start”行出现时分割它，以及如果出现这些行，就像之前一样，tx行像前一样？我想这个代码，但我得到一个类型错误：

data = re.split(r'\n((?=tx)|(?=closeloop\sstart))', data)

来源

2017-08-30 Wynne T

您可以在换行拆分之后是tx：

import re 

re.split(r'\n(?=tx)', inputtext)

演示：

>>> import re 
>>> inputtext = '''tx cycle up.... down 
... rx cycle up.... down 
... phase:... 
... rx on scan: 123456 
... tx cycle up.... down 
... rx cycle up.... down 
... phase:... 
... rx on scan: 789012 
... setup 
... tx cycle up.... down 
... rx cycle up.... down 
... tx cycle up.... down 
... rx cycle up.... down''' 
>>> re.split(r'\n(?=tx)', inputtext) 
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456', 'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup', 'tx cycle up.... down\nrx cycle up.... down', 'tx cycle up.... down\nrx cycle up.... down'] 
>>> from pprint import pprint 
>>> pprint(_) 
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456', 
'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup', 
'tx cycle up.... down\nrx cycle up.... down', 
'tx cycle up.... down\nrx cycle up.... down']

但是，如果你要只是循环输入文件对象（逐行读取），您可以在收集线时处理每个块：

section = [] 
for line in open_file_object: 
    if line.startswith('tx'): 
     # new section 
     if section: 
      process_section(section) 
     section = [line] 
    else: 
     section.append(line) 
if section: 
    process_section(section)

如果需要多个起始线匹配，包括各与前瞻一| - 分隔的选择：

data = re.split(r'\n(?=tx|closeloop\sstart)', data)

来源

2017-08-30 18:40:12

到OP：注意序列'（= TX？）'是一个超前，这会导致split（）函数在分割时不会“丢弃”tx。没有它（像这样：'r'\ ntx）'）tx部分将从你的结果中丢失。 –

@RickTeachey：不，这不是一个非捕获组。这是一个前瞻性的断言。环视并不是比赛的一部分，他们是主播。 –

@MartijnPieters谢谢你！我刚刚在我的新编辑中为我的问题添加了新的部分，如果您也想看一下。 –

回答

相关问题