2016-07-16 52 views
0

我正在使用csvfiles。我的目标是编写一个带有csvfile信息的json格式。 Especifically,我希望得到一个类似的格式miserables.json使用熊猫编写json格式Series和DataFrame

例子:

{"source": "Napoleon", "target": "Myriel", "value": 1}, 

与我的格式将是信息根据:

[ 
{ 
    "source": "Germany", 
    "target": "Mexico", 
    "value": 1 
}, 
{ 
    "source": "Germany", 
    "target": "USA", 
    "value": 2 
}, 
{ 
    "source": "Brazil", 
    "target": "Argentina", 
    "value": 3 
} 
] 

然而,我的代码使用输出外观如下:

[ 
{ 
    "source": "Germany", 
    "target": "Mexico", 
    "value": 1 
}, 
{ 
    "source": null, 
    "target": "USA", 
    "value": 2 
} 
][ 
{ 
    "source": "Brazil", 
    "target": "Argentina", 
    "value": 3 
} 
] 

Null source must是德国。这是主要问题之一,因为有更多城市出现这个问题。除此之外,信息是正确的。我只想在格式中删除几个列表,并将null替换为正确的国家。

这是我使用的代码pandascollections

csvdata = pandas.read_csv('file.csv', low_memory=False, encoding='latin-1') 
countries = csvdata['country'].tolist() 
newcountries = list(set(countries)) 
for element in newcountries: 
    bills = csvdata['target'][csvdata['country'] == element] 
    frquency = Counter(bills) 
    sourceTemp = [] 
    value = [] 
    country = element 
    for k,v in frquency.items(): 
     sourceTemp.append(k) 
     value.append(int(v)) 
    forceData = {'source': Series(country), 'target': Series(sourceTemp), 'value': Series(value)} 
    dfForce = DataFrame(forceData) 
    jsondata = dfForce.to_json(orient='records', force_ascii=False, default_handler=callable) 
    parsed = json.loads(jsondata) 
    newData = json.dumps(parsed, indent=4, ensure_ascii=False, sort_keys=True) 
    # since to_json doesn´t have append mode this will be written in txt file 
    savetxt = open('data.txt', 'a') 
    savetxt.write(newData) 
    savetxt.close() 

任何解决这个问题的建议都很感谢!

感谢

+3

你能提供一些输入CSV文件的行吗? –

回答

1

考虑周围的标量值,国家取出Series()。通过这样做,然后将系列字典升迁为数据框,您可以强制NaN(以后转换为json中的null)到系列中以匹配其他系列的长度。您可以通过打印出dfForce数据框中看到这一点:

from pandas import Series 
from pandas import DataFrame 

country = 'Germany'  
sourceTemp = ['Mexico', 'USA', 'Argentina'] 
value = [1, 2, 3] 

forceData = {'source': Series(country), 
      'target': Series(sourceTemp), 
      'value': Series(value)} 
dfForce = DataFrame(forceData) 

#  source  target value 
# 0 Germany  Mexico  1 
# 1  NaN  USA  2 
# 2  NaN Argentina  3 

要解决,干脆守江山如标在一系列的词典:

forceData = {'source': country, 
      'target': Series(sourceTemp), 
      'value': Series(value)} 
dfForce = DataFrame(forceData) 

#  source  target value 
# 0 Germany  Mexico  1 
# 1 Germany  USA  2 
# 2 Germany Argentina  3 

顺便说一句,你不需要dataframe对象输出到json。只需使用词典列表。使用Ordered Dictionary collection考虑以下事项(以维护密钥的顺序)。通过这种方式,越来越多的列表转储到一个文本文件中,而不会追加这会导致无效的json,因为面对相邻的方括号...][...是不允许的。

from collections import OrderedDict 
... 

data = [] 

for element in newcountries: 
    bills = csvdata['target'][csvdata['country'] == element] 
    frquency = Counter(bills) 

    for k,v in frquency.items(): 
     inner = OrderedDict() 
     inner['source'] = element 
     inner['target'] = k 
     inner['value'] = int(v) 

     data.append(inner) 

newData = json.dumps(data, indent=4) 

with open('data.json', 'w') as savetxt: 
    savetxt.write(newData) 
+0

谢谢@Parfait更好。 – estebanpdl