0
我想读一个大的CSV文件(约17GB)到python Spyder使用熊猫模块。这里是我的代码CParserError当读取CSV文件到Python Spyder
data =pd.read_csv('example.csv', encoding = 'ISO-8859-1')
但我不断收到CParserError错误消息
Traceback (most recent call last):
File "<ipython-input-3-3993cadd40d6>", line 1, in <module>
data =pd.read_csv('newsall.csv', encoding = 'ISO-8859-1')
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 325, in _read
return parser.read()
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 815, in read
ret = self._engine.read(nrows)
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1314, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8748)
File "pandas\parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9003)
File "pandas\parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas\parser.c:9731)
File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)
File "pandas\parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas\parser.c:23325)
CParserError: Error tokenizing data. C error: out of memory
我知道有关于这个问题一些讨论,但它似乎很具体,从各有不同的情况。有人可以帮助我吗?
我在Windows系统上使用python 3。提前致谢。
编辑:
至于建议的ResMar,我尝试下面的代码
data = pd.DataFrame()
reader = pd.read_csv('newsall.csv', encoding = 'ISO-8859-1', chunksize = 10000)
for chunk in reader:
data.append(chunk, ignore_index=True)
但它与
data.shape
Out[12]: (0, 0)
然后返回什么,我尝试下面的代码
data = pd.DataFrame()
reader = pd.read_csv('newsall.csv', encoding = 'ISO-8859-1', chunksize = 10000)
for chunk in reader:
data = data.append(chunk, ignore_index=True)
这再次说明运行内存不足的错误,这里是引用
Traceback (most recent call last):
File "<ipython-input-23-ee9021fcc9b4>", line 3, in <module>
for chunk in reader:
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 795, in __next__
return self.get_chunk()
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 836, in get_chunk
return self.read(nrows=size)
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 815, in read
ret = self._engine.read(nrows)
File "I:\Program Files\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1314, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8748)
File "pandas\parser.pyx", line 839, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9208)
File "pandas\parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas\parser.c:9731)
File "pandas\parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:9602)
File "pandas\parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas\parser.c:23325)
CParserError: Error tokenizing data. C error: out of memory
感谢您的回答。我只是想以数据框的形式读取数据,应该为do_something编写什么代码? –
这是给你确定的。 –
你能看看我编辑的问题吗?它仍然提供错误。 –