我试图解析一个.xls文件。我想:Python:分析.xls文件与xlrd和熊猫都失败
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy.random as np
import sys
print 'Python version ' + sys.version
print 'Pandas version: ' + pd.__version__
# Parse a specific sheet
df = pd.read_excel('NextDebitCreditCard.xls', 0, index_col='StatusDate')
df.dtypes
但我不断收到
File "/usr/lib/python2.7/dist-packages/xlrd/book.py", line 1252, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<html la'
我得到了与xlrd同样的错误。我不确定它是否是普通的xls文件,因此我在此处添加文件的开头和结尾:
<html lang="he">
<head>
<META CONTENT="text/html" HTTP-EQUIV="Content-Type" charset="iso-8859-8"></META><META CONTENT="no-cache" HTTP-EQUIV="Pragma"></META><META CONTENT="0" HTTP-EQUIV="expires"></META><title>
<TEXT>
some text here
.....
.....
.....
.....
₪ 942.56</td></tr></table>
</div>
</div></td><td class="homeMessagesTd" id="leftSide">
</td></tr></table></form></body></html>
任何想法?谢谢!
它看起来像有人保存Excel文件为HTML ... – MattDMo 2014-11-24 22:06:12
嗯是的,我可以看到,但由于Microsoft Excel中有没有问题阅读它,我想,也许这只是正常。有没有解析器? – Yotam 2014-11-24 22:30:37