2017-04-12 164 views
0

我想知道是否有更简单的方法将日期列和其他信息列追加到我现有的csv文件。我添加了这些列,因为这些信息不在REST API调用的JSON字符串中。将非DataFrame追加到熊猫csv

import requests 
import json 
import http.client 
import datetime 
import pandas as pd 
from pandas.io.json import json_normalize 

url = api.getinfo() 
r = requests.get(url, headers=headers, verify=False) 
if r.status_code != http.client.OK: 
    raise requests.HTTPError(r) 

jsonstring = json.dumps(r.json()["data"]) 
load = json.loads(jsonstring) 
df = json_normalize(load) 
col = ["poolId", "totalPoolCapacity", "totalLocatedCapacity", 
     "availableVolumeCapacity", "usedCapacityRate"] 
with open('hss.csv', 'a') as f: 
    df.to_csv(f, header=False, columns=col) 

a = pd.read_csv('hss.csv') 
a['date'] = [datetime.date.today()] * len(a) 
a.to_csv('hss.csv') 
b = pd.read_csv('hss.csv') 
b['storage system'] = "ssystem22" 
b.to_csv('hss.csv') 

我最终每个脚本运行时获得额外列Unnamed: 0,Unnamed: 0.1在我的csv文件。每次我追加它也会覆盖旧的日期。

,Unnamed: 0,Unnamed: 0.1,poolId,totalPoolCapacity, totalLocatedCapacity,availableVolumeCapacity,usedCapacityRate,date,storage system 
0,155472,223618,565064,51,,2017-04-12,ssystem22 
1,943174,819098,262042,58,,2017-04-12,ssystem22 
0,764600,966017,046668,71,,2017-04-12,ssystem22 
1,764600,335680,487650,76,,2017-04-12,ssystem22 
2,373700,459800,304446,67,,2017-04-12,ssystem22 
+0

它可能是索引,而写入csv使用索引= False。 http://pandas.pydata.org/pandas-docs/version/0.18.0/generated/pandas.DataFrame.to_csv.html – Shijo

+0

谢谢@Shijo。在添加'index = False'后,我现在在csv文件中只有一个'Unnamed:0'的实例。 – Clarkus978

+0

我不明白为什么你继续阅读文件并将其重新写回...为什么不在第一次写入csv之前将列添加到df ...只是好奇... – Shahram

回答

0

我一直在研究,发现如何解决这个问题。我应该一直在使用pd.Series函数。以下是更正的代码:

import requests 
import json 
import http.client 
import datetime 
import pandas as pd 
from pandas.io.json import json_normalize 

url = api.getinfo() 
r = requests.get(url, headers=headers, verify=False) 
if r.status_code != http.client.OK: 
    raise requests.HTTPError(r) 

jsonstring = json.dumps(r.json()["data"]) 
load = json.loads(jsonstring) 
df = json_normalize(load) 
df['storage system'] = pd.Series('ssystem22', index=df.index) 
df['date'] = pd.Series(datetime.date.today().strftime('%m-%d-%Y'), 
         index=df.index) 
col = ["poolId", "totalPoolCapacity", "totalLocatedCapacity", 
     "availableVolumeCapacity", "usedCapacityRate", "storage system", 
     "date"] 
with open(csvfile, 'a') as f: 
    df.to_csv(f, header=False, columns=col)