读取XLS和使用Python

其转换为CSV我需要XLS文件转换为CSV以包含在PostgreSQL数据库中的数据，我用下面的代码来进行转换：读取XLS和使用Python

import xlrd 
import unicodecsv 

def xls2csv (xls_filename, csv_filename): 
    # Converts an Excel file to a CSV file. 
    # If the excel file has multiple worksheets, only the first worksheet is converted. 
    # Uses unicodecsv, so it will handle Unicode characters. 
    # Uses a recent version of xlrd, so it should handle old .xls and new .xlsx equally well. 

    wb = xlrd.open_workbook(xls_filename) 
    sh = wb.sheet_by_index(0) 

    fh = open(csv_filename,"wb") 
    csv_out = unicodecsv.writer(fh, encoding='utf-8') 

    for row_number in xrange (sh.nrows): 
     csv_out.writerow(sh.row_values(row_number)) 

    fh.close()

的XLS我使用的文件包含212列和至少100行，当我只用4行测试代码，它工作正常，但是当nrows>5解释引发了以下错误：

xls2csv ('e:/t.xls', 'e:/wh.csv') 
WARNING *** file size (353829) not 512 + multiple of sector size (512) 
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero 
*** No CODEPAGE record, no encoding_override: will use 'ascii' 
*** No CODEPAGE record, no encoding_override: will use 'ascii' 
Traceback (most recent call last): 

    File "<ipython-input-14-ccae93f2d633>", line 1, in <module> 
    xls2csv ('e:/t.xls', 'e:/wh.csv') 

    File "C:/Users/hey/.spyder/temp.py", line 10, in xls2csv 
    wb = xlrd.open_workbook(xls_filename) 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook 
    ragged_rows=ragged_rows, 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls 
    bk.get_sheets() 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 678, in get_sheets 
    self.get_sheet(sheetno) 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 669, in get_sheet 
    sh.read(self) 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\sheet.py", line 804, in read 
    strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2) 

    File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string 
    return unicode(data[pos:pos+nchars], encoding) 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xb2 in position 2: ordinal not in range(128)

来源

2017-05-19 geoinfo

还有就是解码问题，当您打开XLS文件，我怀疑xls文件的5日线有特殊字符的基础上，xlrd documentation，您可以使用encoding_override="cp1251"翻译成Unicode：

wb = xlrd.open_workbook(xls_filename, encoding_override="cp1251")

来源

2017-05-19 15:01:10

你知道如何将生成的'csv'的分隔符设置为';'吗？ – geoinfo

只需使用：'csv_out = unicodecsv.writer（fh，delimiter =';'，encoding ='utf-8'）' –

它看起来像错误不是因为行数，而是因为处理源文件中的unicode字符时出现问题。

我建议你尝试Pandas：

import pandas as pd 

df = pd.read_excel('input.xls') 
df.to_csv('output.csv', encoding='utf-8')

注意（当你不Postgres的部分展开），如果这是第一步，让你的数据于Postgres，一旦你的数据被加载成熊猫数据框，you can send it straight to Postgres。

来源

2017-05-19 14:53:58

我已经已经测试过了，但没有奏效，这里是错误： – geoinfo

读取XLS和使用Python

回答

相关问题