Python文件解析 - > IndexError

我通过一个包含几百条记录的ISI文件解析，这些记录全部以'PT J'标记开头，并以'ER'标记结尾。我试图从嵌套循环中的每个记录拉标签的信息，但不断得到一个IndexError。我知道为什么我会得到它，但是没有人比检查前几个字符有更好的方式来识别新记录的开始？Python文件解析 - > IndexError

while file: 
     while line[1] + line[2] + line[3] + line[4] != 'PT J': 
      ...     
      Search through and record data from tags 
      ...

我使用同样的方法，因此偶尔会得到同样的问题，识别标签，因此，如果您有任何建议，以及我将不胜感激！

样本数据，你会发现并不总是包含每个记录每一个标签，是：

PT J 
    AF Bob Smith 
    TI Python For Dummies 
    DT July 4, 2012 
    ER 

    PT J 
    TI Django for Dummies 
    DT 4/14/2012 
    ER 

    PT J 
    AF Jim Brown 
    TI StackOverflow 
    ER

来源

2012-07-06 MTP

我想指出，我在将它转换为.txt之前，以及在阅读之前。 – MTP 2012-07-06 02:47:56

不要在'ER'行只包含“ER”？这就是为什么你会得到IndexError，因为第[4]行不存在。

，以尝试将是第一件事：

while not line.startswith('PT J'):

，而不是现有的while循环。

此外，片：

line[1] + line[2] + line[3] + line[4] == line[1:5]

（片的两端是noninclusive）

来源

2012-07-06 02:51:46 Marius

是的，'ER'（记录结束）行通常不包含任何其他内容，甚至不包含尾随空格。 – 2012-07-06 08:15:59

我喜欢你的建议......我将不得不多玩它。 – MTP 2012-07-07 02:57:11

你可以尝试这样的方法，通过你的文件中读取。

with open('data.txt') as f: 
    for line in f: 
     line = line.split() # splits your line into a list of character sequences 
          # separated based on whitespace (blanks, tabs) 
     llen = len(line) 
     if llen == 2 and line[0] == 'PT' and line[1] == 'J': # found start of record 
      # process 
      # examine line[0] for 'tags', such as "AF", "TI", "DT" and proceed 
      # as dictated by your needs. 
      # e.g., 

     if llen > 1 and line[0] == "AF": # grab first/last name in line[1] and line[2] 

      # The data will be on the same line and 
      # accessible via the correct index values. 

     if lline == 1 and line[0] == 'ER': # found end of record.

这肯定需要更多的“编程逻辑”（最有可能嵌入环，或者更好的是，调用函数）把一切都在正确的顺序/序列，但其基本结构是那里，我希望能得到你开始并给你一些想法。

来源

2012-07-06 02:54:00 Levon

with open('data1.txt') as f: 
    for line in f: 
     if line.strip()=='PT J': 
      for line in f: 
       if line.strip()!='ER' and line.strip(): 
        #do something with data 
       elif line.strip()=='ER': 
        #this record ends here move to the next record 
        break

来源

2012-07-06 03:00:14

我想我看到这里发生了什么，但是，我将如何访问不同的行来操作或测试它们？由于行是充当迭代器的，因此我们不能在嵌套的'if'语句中说出如下内容：line = file.readline（）什么是替换line = file.readline（）以允许我获取到具体的行？我问，因为在某些情况下，每个标签有多个实体。 – MTP 2012-07-07 03:54:39

Python文件解析 - > IndexError

回答

相关问题