2017-08-30 75 views
5

我有一个储存与缩进/空格中源会计师树解析层次:如何基于与蟒蛇缩进

Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 

有水平的固定号码,所以我想扁平化层次结构,通过使用3个字段(实际数据具有6个级别,简化例如):

for rownum in range(6,ws.max_row+1): 
    accountName = str(ws.cell(row=rownum,column=1).value) 
    indent = len(accountName) - len(accountName.lstrip(' ')) 
    if indent == 0: 
     l1 = accountName 
     l2 = '' 
     l3 = '' 
    elif indent == 3: 
     l2 = accountName 
     l3 = '' 
    else: 
     l3 = accountName 

    w.writerow([l1,l2,l3]) 

L1  L2   L3 
Income 
Income Revenue 
Income Revenue  IAP 
Income Revenue  Ads 
Income Other-Income 
Expenses Developers In-house 
... etc 

我可以通过检查之前的帐户名的空格数要这样做

有没有一种更灵活的方式来实现这一点,基于当前行的缩进与前一行相比,而不是假设它每个级别总是3个空格? L1将始终没有缩进,并且我们可以相信较低的级别会比其父级进一步缩进,但每个级别可能不总是3个空格。

更新,最终以此作为逻辑的肉,因为我最终希望拥有内容的帐户列表,似乎最简单的方法是使用缩进来决定是重置,追加还是弹出列表:

 if indent == 0: 
      accountList = [] 
      accountList.append((indent,accountName)) 
     elif indent > prev_indent: 
      accountList.append((indent,accountName)) 
     elif indent <= prev_indent: 
      max_indent = int(max(accountList,key=itemgetter(0))[0]) 
      while max_indent >= indent: 
       accountList.pop() 
       max_indent = int(max(accountList,key=itemgetter(0))[0]) 
      accountList.append((indent,accountName)) 

所以在输出的每一行accountList都是完整的。

回答

4

你可以模仿Python实际解析缩进的方式。 首先,创建一个包含缩进级别的堆栈。 在每一行上:

  • 如果压痕大于堆栈顶部,则按下它并增加深度级别。
  • 如果相同,继续在同一级别。
  • 如果较低,则弹出堆栈顶部,高于新缩进。 如果在查找完全相同之前发现较低的缩进级别,则会出现缩进错误。
indentation = [] 
indentation.append(0) 
depth = 0 

f = open("test.txt", 'r') 

for line in f: 
    line = line[:-1] 

    content = line.strip() 
    indent = len(line) - len(content) 
    if indent > indentation[-1]: 
     depth += 1 
     indentation.append(indent) 

    elif indent < indentation[-1]: 
     while indent < indentation[-1]: 
      depth -= 1 
      indentation.pop() 

     if indent != indentation[-1]: 
      raise RuntimeError("Bad formatting") 

    print(f"{content} (depth: {depth})") 

随着其含量 “的test.txt” 文件是为您提供:

Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 

这里是输出:

Income (depth: 0) 
Revenue (depth: 1) 
IAP (depth: 2) 
Ads (depth: 2) 
Other-Income (depth: 1) 
Expenses (depth: 0) 
Developers (depth: 1) 
In-house (depth: 2) 
Contractors (depth: 2) 
Advertising (depth: 1) 
Other Expense (depth: 1) 

所以,你可以你这样做? 假设你想构建嵌套列表。 首先,创建一个数据堆栈。

  • 当您找到缩进时,在数据堆栈的末尾附加一个新列表。
  • 当您发现一个unindentation时,弹出顶部列表,并将其追加到新的顶部。

而且,无论如何,对于每一行,都会将内容附加到数据堆栈顶部的列表中。

下面是相应的实施:

for line in f: 
    line = line[:-1] 

    content = line.strip() 
    indent = len(line) - len(content) 
    if indent > indentation[-1]: 
     depth += 1 
     indentation.append(indent) 
     data.append([]) 

    elif indent < indentation[-1]: 
     while indent < indentation[-1]: 
      depth -= 1 
      indentation.pop() 
      top = data.pop() 
      data[-1].append(top) 

     if indent != indentation[-1]: 
      raise RuntimeError("Bad formatting") 

    data[-1].append(content) 

while len(data) > 1: 
    top = data.pop() 
    data[-1].append(top) 

你的嵌套列表是在您data堆栈的顶部。 为同一文件的输出是:

['Income', 
    ['Revenue', 
     ['IAP', 
     'Ads' 
     ], 
    'Other-Income' 
    ], 
'Expenses', 
    ['Developers', 
     ['In-house', 
     'Contractors' 
     ], 
    'Advertising', 
    'Other Expense' 
    ] 
] 

这是比较容易操纵,虽然相当深度嵌套。 您可以通过级联项访问数据访问:

>>> l = data[0] 
>>> l 
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']] 
>>> l[1] 
['Revenue', ['IAP', 'Ads'], 'Other-Income'] 
>>> l[1][1] 
['IAP', 'Ads'] 
>>> l[1][1][0] 
'IAP' 
+0

感谢这个,我最终希望能够输出在与行的内容沿每一行的层次,所以我稍作修改,但这让我朝着正确的方向前进。 –

2

如果压痕是空间固定金额(这里3个空格),可以简化缩进级别的计算。

注:我用StringIO的模拟文件

import io 
import itertools 

content = u"""\ 
Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 
""" 

stack = [] 
for line in io.StringIO(content): 
    content = line.rstrip() # drop \n 
    row = content.split(" ") 
    stack[:] = stack[:len(row) - 1] + [row[-1]] 
    print("\t".join(stack)) 

你得到:

Income 
Income Revenue 
Income Revenue IAP 
Income Revenue Ads 
Income Other-Income 
Expenses 
Expenses Developers 
Expenses Developers In-house 
Expenses Developers Contractors 
Expenses Advertising 
Expenses Other Expenses 

编辑:压痕不固定

如果缩进不是固定(你并不总是有3个空格),如下例所示:

content = u"""\ 
Income 
    Revenue 
    IAP 
    Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 
""" 

你需要估计在每一个新行转移:

stack = [] 
last_indent = u"" 
for line in io.StringIO(content): 
    indent = "".join(itertools.takewhile(lambda c: c == " ", line)) 
    shift = 0 if indent == last_indent else (-1 if len(indent) < len(last_indent) else 1) 
    index = len(stack) + shift 
    stack[:] = stack[:index - 1] + [line.strip()] 
    last_indent = indent 
    print("\t".join(stack))