如何使用二进制元素将数据解析到Python列表中？

样品看起来是这样的：如何使用二进制元素将数据解析到Python列表中？

lst = ['ms 20 3 -s 10 \n', '17954 11302 58011\n', '\n', '$$\n', 'segsites: 10\n', 'positions: 0.0706 0.2241 0.2575 0.889 \n', '0001000010\n', '0101000010\n', '0101010010\n', '0001000010\n', '\n', '$$\n', 'segsites: 10\n', 'positions: 0.0038 0.1622 0.1972 \n', '0110000110\n', '1001001000\n', '0010000110\n', '$$\n', 'segsites: 10\n', 'positions: 0.0155 0.0779 0.2092 \n', '0000001011\n', '0000001011\n', '0000001011\n']

每一个新的集合与$$开始。我需要解析数据，以便我列出以下列表。

sample = [['0001000010', '0101000010', '0101010010', '0001000010'],['0110000110', '1001001000', '0010000110'],['0000001011', '0000001011', '0000001011'] # Required Output

代码在分析数据，并试图找出如何得到这个权利试图

sample =[[]] 
sample1 = "" 
seqlist = [] 

for line in lst: 
    if line.startswith("$$"): 
     if line in '01': #Line contains only 0's or 1 
      sample1.append(line) #Append each line that with 1 and 0's in a string one after another 
    sample.append(sample1.strip()) #Do this or last line is lost 
print sample 

Output:[[], '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

我是一个新手。赞赏如何修改代码和解释的建议。

来源

2016-11-07 biogeek

我会做下列方式：

import re 

lst = ['ms 20 3 -s 10 \n', '17954 11302 58011\n', '\n', '$$\n', 'segsites: 10\n', 'positions: 0.0706 0.2241 0.2575 0.889 \n', '0001000010\n', '0101000010\n', '0101010010\n', '0001000010\n', '\n', '$$\n', 'segsites: 10\n', 'positions: 0.0038 0.1622 0.1972 \n', '0110000110\n', '1001001000\n', '0010000110\n', '$$\n', 'segsites: 10\n', 'positions: 0.0155 0.0779 0.2092 \n', '0000001011\n', '0000001011\n', '0000001011\n'] 

result = [] 
curr_group = [] 
for item in lst: 
    item = item.rstrip() # Remove \n 
    if '$$' in item: 
     if len(curr_group) > 0: # Check to see if binary numbers have been found. 
      result.append(curr_group) 
      curr_group = [] 
    elif re.match('[01]+$', item): # Checks to see if string is binary (0s or 1s). 
     curr_group.append(item) 

result.append(curr_group) # Appends final group due to lack of ending '$$'. 

print(result)

基本上，你想遍历这些项目，直到找到'$$'，然后将以前找到的任何二进制字符添加到最终结果中，然后开始一个新组。您找到的每个二进制字符串（使用正则表达式）都应添加到当前组中。

最后，你需要添加最后一组二进制数的，因为没有尾随'$$'

来源

2016-11-07 05:34:30 Darkstarone

我似乎有麻烦使它成为我的原始数据，虽然设置工作。 https://eval.in/673188 – biogeek

在我的原始数据集中，分隔符（$$）是不同的。一旦我改变分隔符的类型，输出就会崩溃。 – biogeek

有什么建议吗？ – biogeek

您的问题是（至少）在这里：if line in '01'。

此行意味着if line == '0' or line == '1'，这绝对不是你想要的。

一个基本的，但工作方法，将是检验，每一个字符串，如果它只是0和1组成：

def is_binary(string) : 
    for c in string : 
     if c not in '01' : 
      return False 
    return True

该函数返回True如果string可以解释为二进制值，如果不是，则为False。

当然，你必须在年底来管理“\ n”的，但你得到的主要思想;）

来源

2016-11-07 05:33:59

如何使用二进制元素将数据解析到Python列表中？

回答

相关问题