2011-12-02 56 views
1

我坚持在这里的逻辑......我不得不从看起来像这样如何从Python中的文件中提取特定的一组值?

AAA 
+-------------+------------------+ 
|   ID |   count | 
+-------------+------------------+ 
|   3 |    1445 | 
|   4 |    105 | 
|   9 |    160 | 
|   10 |    30 | 
+-------------+------------------+ 
BBB 
+-------------+------------------+ 
|   ID |   count | 
+-------------+------------------+ 
|   3 |    1445 | 
|   4 |    105 | 
|   9 |    160 | 
|   10 |    30 | 
+-------------+------------------+ 
CCC 
+-------------+------------------+ 
|   ID |   count | 
+-------------+------------------+ 
|   3 |    1445 | 
|   4 |    105 | 
|   9 |    160 | 
|   10 |    30 | 
+-------------+------------------+ 

我无法独自从BBB提取值,并将其追加到像一个列表的文本文件中提取一些值

f = open(sys.argv[1], "r") 
text = f.readlines() 
B_Values = [] 
for i in text: 
    if i.startswith("BBB"):(Example) 
     B_Values.append("only values of BBB") 
    if i.startswith("CCC"): 
     break 

print B_Values 

应导致

['|   3 |    1445 |','|   4 |    105 |','|   9 |    160 |','|   10 |    30 |'] 
+0

它是功课吗? –

回答

3
d = {} 
with open(sys.argv[1]) as f: 
    for line in f: 
     if line[0].isalpha(): # is first character in the line a letter? 
      curr = d.setdefault(line.strip(), []) 
     elif filter(str.isdigit, line): # is there any digit in the line? 
      curr.append(line.strip()) 

此文件,d现在是:

{'AAA': ['|   3 |    1445 |', 
     '|   4 |    105 |', 
     '|   9 |    160 |', 
     '|   10 |    30 |'], 
'BBB': ['|   3 |    1445 |', 
     '|   4 |    105 |', 
     '|   9 |    160 |', 
     '|   10 |    30 |'], 
'CCC': ['|   3 |    1445 |', 
     '|   4 |    105 |', 
     '|   9 |    160 |', 
     '|   10 |    30 |']} 

B_valuesd['BBB']

0

您可以使用一个状态标志bstarted当B组已经开始跟踪。 扫描B组后,删除三个标题行和一个页脚行。

B_Values = [] 
bstarted = False 
for i in text: 
    if i.startswith("BBB"): 
     bstarted = True 
    elif i.startswith("CCC"): 
     bstarted = False 
     break 
    elif bstarted: 
     B_Values.append(i) 

del B_Values[:3] # get rid of the header 
del B_Values[-1] # get rid of the footer 
print B_Values 
0

您应该避免遍历已读取的行。只要你想读下一行和检查,看看它是什么调用的ReadLine:

f = open(sys.argv[1], "r") 
B_Values = [] 
while i != "": 
    i = f.readline() 
    if i.startswith("BBB"): #(Example) 
     for temp in range(3): 
      f.skipline() #Skip the 3 lines of table headers 
     i = f.readline() 
     while i != "+-------------+------------------+" and i !="": 
      #While we've not reached the table footer 
      B_Values.append(i) 
      i = f.readline() 
     break 

#Although not necessary, you'd better put a close function there, too. 
f.close() 

print B_Values 

编辑:@eumiro的方法比我更灵活。因为它读取所有部分的所有值。虽然您可以在我的示例中执行isalpha测试以读取所有值,但他的方法仍然更易于阅读。

相关问题