训练数据= https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data 测试数据= https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test如何设置在读csv文件正确的参数(蟒蛇,熊猫)
import numpy as np
import pandas as pd
train_data = pd.read_csv('adult.data.txt',sep= ',', header= None)
test_data = pd.read_csv('adult.test.txt',sep= ',', header= None)
当我做这个,有在读的测试数据错误,而不是即使布局中的训练数据是相同的:
Traceback (most recent call last):
File "dtree.py", line 61, in <module>
dtree()
File "dtree.py", line 12, in dtree
test_data = pd.read_csv('adult.test.txt',sep= ',', header= None)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line
498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line
285, in _read
return parser.read()
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line
747, in read
ret = self._engine.read(nrows)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line
1197, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read
(pandas/parser.c:7988)
File "pandas/parser.pyx", line 788, in
pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)
File "pandas/parser.pyx", line 842, in
pandas.parser.TextReader._read_rows (pandas/parser.c:8970)
File "pandas/parser.pyx", line 829, in
pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
File "pandas/parser.pyx", line 1833, in
pandas.parser.raise_parser_error
(pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
fields in line 2, saw 15
于是我在TEST_DATA改变报头= 0和它编译,但只有1个列,而不是像15在train_data。这会导致问题,因为test_data.values只给出最后一列,与train_data.values不同。
我注意到测试和训练数据有两个不同之处。在测试中,每一行以完全停止的方式结束,训练没有任何内容,并且测试的第一行不是列车的入口。这是造成问题的原因之一吗?我如何克服它们?