python中的索引

我想索引到一个excel文件，我用whoosh包，但是，我发现一个错误，列表索引超出范围。请问，任何人都可以帮我吗？我的代码是：python中的索引

from whoosh import fields, index 
import os.path 
import csv 
import codecs 

# This list associates a name with each position in a row 
columns = ["juza","chapter","verse","analysis"] 

schema = fields.Schema(juza=fields.NUMERIC, 
         chapter=fields.NUMERIC, 
         verse=fields.NUMERIC, 
         analysis=fields.KEYWORD) 


# Create the Whoosh index 
indexname = "index" 
if not os.path.exists(indexname): 
    os.mkdir(indexname) 
ix = index.create_in(indexname, schema) 

# Open a writer for the index 
with ix.writer() as writer: 
    # Open the CSV file 
    with codecs.open("yom.csv", "rb","utf8") as csvfile: 
    # Create a csv reader object for the file 
    csvreader = csv.reader(csvfile) 

    # Read each row in the file 
    for row in csvreader: 

     # Create a dictionary to hold the document values for this row 
     doc = {} 

     # Read the values for the row enumerated like 
     # (0, "juza"), (1, "chapter"), etc. 
     for colnum, value in enumerate(row): 

     # Get the field name from the "columns" list 
     fieldname = columns[colnum] 

     # Strip any whitespace and convert to unicode 
     # NOTE: you need to pass the right encoding here! 
     value = unicode(value.strip(), "utf-8") 

     # Put the value in the dictionary 
     doc[fieldname] = value 

     # Pass the dictionary to the add_document method 
     writer.add_document(**doc) 
    writer.commit() 
`

和我得到这个错误，我不知道为什么？错误：

Traceback (most recent call last): 
    File "C:\Python27\yarab.py", line 39, in <module> 
    fieldname = columns[colnum] 
IndexError: list index out of range

和我的csv文件：

1 3 1 Al+ POS:ADJ LEM:r~aHoma`n ROOT:rHm MS GEN 
1 3 2 Al+ POS:ADJ LEM:r~aHiym ROOT:rHm MS GEN 
1 4 1 POS:N ACT PCPL LEM:ma`lik ROOT:mlk M GEN 
1 4 2 POS:N LEM:yawom ROOT:ywm M GEN 
1 4 3 Al+ POS:N LEM:diyn ROOT:dyn M GEN 
1 5 1 POS:PRON LEM:&lt;iy~aA 2MS

来源

2013-02-20 user2091683

csv.reader使用默认的分隔符逗号：,

你必须明确地定义你的分隔符：

csvreader = csv.reader(csvfile, delimiter=...)

然而，您的CSV文件不是同质的。这将是更好的，而不csv来阅读：

columns = ["juza","chapter","verse","analysis"] 
with codecs.open("yom.csv", "rb","utf8") as f: 
    for line in f: 
     a, b, c, rest = line.split(' ', 3) 
     doc = {k:v.strip() for k,v in zip(columns, rest.split(':'))} 
     # a,b,c are the first three integers 
     # doc is a dictionary

来源

2013-02-20 15:03:22 eumiro

你的意思是我应该删除“csvreader”，并与您的推荐代码代替它呢？但如果我这样做，现在的问题是，我如何将字段名称在以下行： “fieldname = columns [colnum]” – user2091683 2013-02-20 15:26:01

@ user2091683 - 你不需要'colnum'了。 'zip（columns，rest.split（'：'））''将它们拉到一起，'doc'-dictionary包含整个条目。 – eumiro 2013-02-20 15:29:01

这是我的新代码，请打开此链接：（http://pastebin.com/qWPZsiyd）出现此错误：回溯（最近通话最后一个）：文件“C：\ Python27 \ yarab。 py“，第26行，在 juza，chapter，verse，analysis = line.split（''，3） ValueError：需要多个值才能解包 – user2091683 2013-02-20 15:42:51

python中的索引

回答

相关问题