2017-09-25 98 views
0

我试图将此乳房癌威斯康星州数据集从列表转换为包含列的数据框。将url中的数据列表转换为python中的csv

下面是数据集: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

这些列名:

# Attribute      Domain 
    -- ----------------------------------------- 
    1. Sample code number   id number 
    2. Clump Thickness    1 - 10 
    3. Uniformity of Cell Size  1 - 10 
    4. Uniformity of Cell Shape  1 - 10 
    5. Marginal Adhesion    1 - 10 
    6. Single Epithelial Cell Size 1 - 10 
    7. Bare Nuclei     1 - 10 
    8. Bland Chromatin    1 - 10 
    9. Normal Nucleoli    1 - 10 
    10. Mitoses      1 - 10 
    11. Class:      (2 for benign, 4 for malignant) 

我导入的数据集分成蟒蛇这样

导入请求

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data" 
f = requests.get(link) 

print (f.text) 

并将数据视为列表机智^ h逗号:

1000025,5,1,1,1,2,1,3,1,1,2 
1002945,5,4,4,5,7,10,3,2,1,2 
1015425,3,1,1,1,2,2,3,1,1,2 
1016277,6,8,8,1,3,4,3,7,1,2 
1017023,4,1,1,3,2,1,3,1,1,2 

我需要逗号分隔成列和名称添加到列

我试过,但没有奏效

import requests 
import pandas as pd 
import io 

urlData = requests.get(f.text).content 
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8'))) 
+0

可能重复[链接](https://stackoverflow.com/a/41880513/3959965 ) – dalonlobo

+0

[Pandas read \ _csv from url]可能重复(https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url) – miradulo

+0

只是'pd.read_csv(link,header = None)' - 比较简单:) – miradulo

回答

-1
import requests 
import pandas as pd 
import io 

names = ['Sample code number', 
     'Clump Thickness', 
     'Uniformity of Cell Size', 
     'Uniformity of Cell Shape', 
     'Marginal Adhesion', 
     'Single Epithelial Cell Size', 
     'Bare Nuclei', 
     'Bland Chromatin', 
     'Normal Nucleoli', 
     'Mitoses', 
     'Class'] 

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data" 
csv_text = requests.get(link).text 
# if you don't care about column names omit names=names and do headers=None instead 
df = pd.read_csv(io.StringIO(csv_text), names=names) 
-1

我肯定会想一个更好的方法来做到这一点,但....我已经将输出发送到一个带有静态标题行的csv。由于数据已经被“,”分隔,我认为这将是最简单的方法。

import requests 
import io 

def main(): 
    outputFile = 'someName.csv' 
    link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data" 
    f = requests.get(link) 
    headerLine = ("Sample code number(id number),Clump Thickness(1 - 10),Uniformity of Cell Size(1 - 10),Uniformity of Cell Shape(1 - 10),Marginal Adhesion(1 - 10),Single Epithelial Cell Size(1 - 10),Bare Nuclei(1 - 10),Bland Chromatin(1 - 10),Normal Nucleoli(1 - 10),Mitoses(1 - 10),Class:(2 for benign - 4 for malignant)") 
    data =(f.text) 
    try: 
     with open(outputFile, "w+") as ofile: 
      ofile.write(headerLine + '\n') 
      ofile.write(data) 
      print("Success") 
    except Exception as e: 
     raise e 

if __name__ == '__main__': 
    main() 
0

这将这样的伎俩

import requests 
import os 

csvFile = open('c:\\users\\user\\desktop\\data.csv','w') 
headers = 'sample','Clump Thickness','niformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class' 
r = requests.get("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data").text 
csvFile.write(str(headers).replace("'",'').replace('(','').replace(')','') + "\n") 
csvFile.write(r) 
csvFile.close() 
0

以下为我工作:

import pandas as pd 
import requests 
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data" 
f = requests.get(link) 
# separate each line 
newf = f.text.splitlines() 
# create pandas dataframe 
df = pd.DataFrame([x.split(",") for x in newf])