2017-03-09 53 views
1

我已阅读有关此主题的几个问题,但似乎没有为我工作。阅读网址为熊猫数据框与列名(python3)

我想从这个页面检索数据“http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat”,并为这些列指定了一些名称。

我的代码如下,这并不让我指定名称的数据列,因为一切都在一列:

import pandas as pd 
import io 
import requests 
url="http://archive.ics.uci.edu/ml/machine-learningdatabases/statlog/heart/heart.dat" 
s=requests.get(url).content 
header_row = ['age','sex','chestpain','restBP','chol','sugar','ecg','maxhr','angina','dep','exercise','fluor','thal','diagnosis'] 
c=pd.read_csv(io.StringIO(s.decode('utf-8')), names=header_row) 
print(c) 

输出是:

 age sex chestpain \ 
0 70.0 1.0 4.0 130.0 322.0 0.0 2.0 109.0 0.0 2.4... NaN  NaN 
1 67.0 0.0 3.0 115.0 564.0 0.0 2.0 160.0 0.0 1.6... NaN  NaN 
2 57.0 1.0 2.0 124.0 261.0 0.0 0.0 141.0 0.0 0.3... NaN  NaN 
3 64.0 1.0 4.0 128.0 263.0 0.0 0.0 105.0 1.0 0.2... NaN  NaN 

我需要做些什么来实现我的目标?

非常感谢!

+0

你确定的网址。我在打开它时遇到404错误 –

+0

正确的网址https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat –

回答

1

您提供的链接缺少连字符。我在我的回答中纠正了这一点。基本上,您需要将s字符串解码为utf-8,然后将其拆分为\n以获取每一行,然后将每行分割到空白区域以分别获取每个值。这将为您提供数据集的嵌套列表表示,您可以将其转换为熊猫数据框,然后您可以分配列名称。

import pandas as pd 
import io 
import requests 
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat" 
s = requests.get(url).content 
s = s.decode('utf-8') 
s_rows = s.split('\n') 
s_rows_cols = [each.split() for each in s_rows] 
header_row = ['age','sex','chestpain','restBP','chol','sugar','ecg','maxhr','angina','dep','exercise','fluor','thal','diagnosis'] 
c = pd.DataFrame(s_rows_cols, columns = header_row) 
c.head() 
+0

非常感谢!这就是我需要的!最好的祝福!!! –