试试这个:
In [59]: cols = 'PRICE YEAR MONTH'.split()
In [60]: cols
Out[60]: ['PRICE', 'YEAR', 'MONTH']
In [61]: for c in cols:
...: df[c] = pd.to_numeric(df[c], errors='coerce')
...:
In [62]: df
Out[62]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000.0 2012 4 FORD
1 100001 10000.0 2015 5 MERCEDES
2 100002 NaN 2016 6 AUDI
再现您的错误:
In [65]: df
Out[65]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000 2012 4 FORD
1 100001 10000 2015 5 MERCEDES
2 100002 PRICE 2016 6 AUDI # pay attention at `PRICE` value !!!
In [66]: df['PRICE'].astype(int)
...
skipped
...
ValueError: invalid literal for int() with base 10: 'PRICE'
由于@jezrael has added in this comment你最有可能有 “坏”(意外)值在你的数据集中。
您可以使用下面的方法之一,以便清理:
In [155]: df
Out[155]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000 2012 4 FORD
1 100001 10000 2015 5 MERCEDES
2 Ref_id PRICE YEAR MONTH BRAND
3 100002 15000 2016 5 AUDI
In [156]: df.dtypes
Out[156]:
Ref_id object
PRICE object
YEAR object
MONTH object
BRAND object
dtype: object
In [157]: df = df.drop(df.loc[df.PRICE == 'PRICE'].index)
In [158]: df
Out[158]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000 2012 4 FORD
1 100001 10000 2015 5 MERCEDES
3 100002 15000 2016 5 AUDI
In [159]: for c in cols:
...: df[c] = pd.to_numeric(df[c], errors='coerce')
...:
In [160]: df
Out[160]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000 2012 4 FORD
1 100001 10000 2015 5 MERCEDES
3 100002 15000 2016 5 AUDI
In [161]: df.dtypes
Out[161]:
Ref_id object
PRICE int64
YEAR int64
MONTH int64
BRAND object
dtype: object
或者干脆:
In [159]: for c in cols:
...: df[c] = pd.to_numeric(df[c], errors='coerce')
...:
In [165]: df
Out[165]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000.0 2012.0 4.0 FORD
1 100001 10000.0 2015.0 5.0 MERCEDES
2 Ref_id NaN NaN NaN BRAND
3 100002 15000.0 2016.0 5.0 AUDI
然后.dropna(how='any')
如果你知道那里没有NaN
的你原始数据集:
In [166]: df = df.dropna(how='any')
In [167]: df
Out[167]:
Ref_id PRICE YEAR MONTH BRAND
0 100000 5000.0 2012.0 4.0 FORD
1 100001 10000.0 2015.0 5.0 MERCEDES
3 100002 15000.0 2016.0 5.0 AUDI
请发布原始数据和完整的代码,是错误 – EdChum
对不起,我的数据和代码是保密的,所以我不能分享他们。我可以告诉你一个数据集(测试)和错误。 –