2017-04-24 73 views
1

我有一些写在一个类似字典的格式的数据文件:导入类似字典的数据转换成熊猫

{"score": [0.9995803236961365, 0.00041968212462961674], "key": "Am2mVTMbhd0y", "label": "0"} 
{"score": [0.9997120499610901, 0.00028794570243917406], "key": "AmG8StB8hM2k", "label": "0"} 
{"score": [0.8841496109962463, 0.11585044860839844], "key": "Alt137zv2nY6", "label": "0"} 
{"score": [0.9999467134475708, 5.334055458661169e-05], "key": "AmGdF7cY4X22", "label": "0"} 

我想要做的就是将它们导入到大熊猫,与列作为“关键','标签'和'分数',并且必须将两个数字值放在单独的列中。我已经尝试导入文件作为字典,但我得到:

ValueError: too many values to unpack 

有关如何解决此问题的任何建议?

+0

这个错误occour因为你的文件可能包含一些错误这是不符合字典格式 –

回答

0

我认为你需要参数lines=Trueread_json

df = pd.read_json('file.json', lines=True) 
print (df) 
      key label           score 
0 Am2mVTMbhd0y  0 [0.999580323696136, 0.00041968212462900004] 
1 AmG8StB8hM2k  0 [0.9997120499610901, 0.00028794570243900004] 
2 Alt137zv2nY6  0  [0.8841496109962461, 0.11585044860839801] 
3 AmGdF7cY4X22  0 [0.99994671344757, 5.3340554586611695e-05] 

print (type(df['score'].iat[0])) 
<class 'list'> 

对于转换lists到列使用DataFrame构造与concat

df = pd.concat([df.drop('score', 1), 
       pd.DataFrame(df['score'].values.tolist()).add_prefix('score')], axis=1) 
print (df) 
      key label score0 score1 
0 Am2mVTMbhd0y  0 0.999580 0.000420 
1 AmG8StB8hM2k  0 0.999712 0.000288 
2 Alt137zv2nY6  0 0.884150 0.115850 
3 AmGdF7cY4X22  0 0.999947 0.000053 
+0

完美!谢谢! –

0
import pandas as pd 

#add your data in a list 
data = [{"score": [0.9995803236961365, 0.00041968212462961674], "key": "Am2mVTMbhd0y", "label": "0"}, 
{"score": [0.9997120499610901, 0.00028794570243917406], "key": "AmG8StB8hM2k", "label": "0"}, 
{"score": [0.8841496109962463, 0.11585044860839844], "key": "Alt137zv2nY6", "label": "0"}, 
{"score": [0.9999467134475708, 5.334055458661169e-05], "key": "AmGdF7cY4X22", "label": "0"}] 
#create dataframe 
df = pd.DataFrame(data)