2017-08-24 234 views
1

我在使用Python 3查找大型二进制数集中最长连续数字集的开始点和结束点。目前我已经找到了1和0的最长连续数,现在我必须找到每个数的起点和终点。到目前为止,我的代码是:在Python中查找大型二进制集中最长连续数字集的开始点和结束点

为1的:

def getMaxSegmentLength(readable): 
    current_length = 0 
    max_length = 0 


    for x in readable: 
     if x == '1': 
      current_length += 1 
     else: 
      max_length = max(max_length, current_length) 
      current_length = 0 

     return max(max_length, current_length) 


def main(): 
    with open('C:/01.txt', 'r') as inputf: 
     s = inputf.read() 
     n = getMaxSegmentLength(s) 
    print("The longest streak of 1's = " + str(n)) 


if __name__ == '__main__': 
    main() 

为0的:

def getMaxSegmentLength(readable): 
    current_length = 0 
    max_length = 0 


    for x in readable: 
     if x == '0': 
      current_length += 1 
     else: 
      max_length = max(max_length, current_length) 
      current_length = 0 

     return max(max_length, current_length) 


def main(): 
    with open('C:/01.txt', 'r') as inputf: 
     s = inputf.read() 
     m = getMaxSegmentLength(s) 
    print("The longest streak of 0's = " + str(m)) 


if __name__ == '__main__': 
    main() 

这个代码是找到最长的连续组数字,其中包含在一个非常大的二进制集单独的文件。我也知道总共有多少个0和1,并且我还没有开始下一步查找起点和终点。任何帮助非常感谢,因为我是Python 3的新手。

+1

我想你需要[枚举] (https://docs.python.org/2.3/whatsnew/section-enumerate.html)。 –

回答

0

简单,跟踪1开始的​​条纹和变量(max_streak)以保持最大连贯的起点。每次发现更大的连胜更新max_streak。

def getMaxSegmentLength(readable, digit): 
'''find the longest streak of digit in the readable string''' 
    current_length = 0 
    max_length = 0 

    starts_at= -1 
    max_starts_at= -1 

    for i, x in enumerate(readable): 
     if x == digit: 
      current_length += 1 
      if current_length == 1: 
       starts_at = i 

     elif max_length < current_length: 
      max_length = current_length 
      max_starts_at = starts_at 
      current_length = 0 

    if max_length < current_length: 
     max_length = current_length 
     max_starts_at = starts_at 

    max_ends_at = max_starts_at+max_length-1 

    # return a tuple of start point and end point index 
    return max_starts_at, max_ends_at 


def main(): 
    with open('F:/input.txt', 'r') as inputf: 
     s = inputf.read() 

     # check for 1's 
     n = getMaxSegmentLength(s, '1') 
     print("The longest streak of 1's = " + str(n)) 

     # check for 0's 
     n = getMaxSegmentLength(s, '0') 
     print("The longest streak of 0's = " + str(n)) 

if __name__ == '__main__': 
    main() 
0

你可以使用正则表达式每个序列匹配,然后更新相应的数字的字典:

import re 

# example input string 
input = "00111101100010100010101111011011011" 

best = { 
    "0": { "start": 0, "len": 0 }, 
    "1": { "start": 0, "len": 0 } 
}; 
for m in re.compile(r"(.)\1*").finditer(input): 
    if best[m.group()[0]]["len"] < len(m.group()): 
     best[m.group()[0]] = { "start": m.start(), "len": len(m.group()) } 

print (best) 

输出:

{'1': {'start': 2, 'len': 4}, '0': {'start': 9, 'len': 3}} 
相关问题