2017-12-18 138 views
0

我想找出将表转换成JSON记录的最佳方法。目前我有我想要的输出,但桌子的格式令我困惑。下面的例子应该解释:熊猫表scrape

ID Product  Item_Material Owner   Interest % 
123 Test Item 1 Electric  Elctrotech    60% 
null null   null   Spark inc    40% 
124 Test Item 2 Wood   TY Toys     100% 
125 Test Item 3 Plastic   NA Materials   100% 

我的新行JSON是我想要的,但我期待如果父行的一部分以某种方式实现嵌套表行到一个嵌套的JSON格式。

{"ID":"Test Item 1", "Item_Material":"Electric", "Owner":"Elctrotech","Interest %":"60%"} 
{"ID":null, "Item_Material":null, "Owner":"Spark inc","Insterest %":"40%"} 
{"ID":"Test Item 2", "Item_Material":"Wood", "Owner":"TY Toys","Insterest %":"100%"} 
{"ID":"Test Item 3","Item_Material":"Plastic","Owner":"NA Materials","Interest %":"100%"} 

其目的是让第一行JSON像这样?

{"ID":"Test Item 1", "Item_Material":"Electric", "Owners": [{"Owner": "Elctrotech", "Interest %":"60%", "Owner":"Spark inc","Interest %":"40%"}]} 

数据使用美丽的汤从刮表起源,所以当拉成大熊猫数据帧就提出这样我提供的表中的行都是在单独的<tr>标签。我不知道是否有功能,甚至在熊猫上合并到上面的行,所以我可以有一个JSON记录每个'产品'。有时可能有多个'所有者'每个项目不只是2.

+0

'JSON'不能,如果你在列'Product'和'Item_Material有正确的价值观,而不是'null'有'Owner'和一个''Interest'两次{}' – furas

+1

'然后你可以使用'group_by()'获得至少组元素,也许你可以更容易地将它保存为JSON。 – furas

+0

我注意到了{}错误,谢谢。是的,我只是想到了这种方法,你知道用前一行值替换任何'null'值的方法吗?我认为这将允许'groupby'整理它 – Chris

回答

0

输出字典行不是你所期望的,但你的字典sintax是错误的。尝试这个。只有熊猫

p=[[123,"Test Item 1","Electric","Elctrotech","60%"], [124,"Test Item 2","Wood"," TY Toys","100%"],[125,"Test Item 1","Plastic","NA Materials","100%"], [123,"Test Item 1","Foo","Bar","80%"], [123,"Test Item 1","Electric","TRY TRY TRY","70%"]] 

x=pd.DataFrame(p, columns=["ID","Product","Item_Material","Owner","Interest %"]) 

d=dict(ID="", Item_Material="", Owners={"Owner":[], "Interest %":[]}) 
x_gb=x.groupby(["Product", "Item_Material"]) 
grouped_Series_Owner = x_gb["Owner"].apply(list).to_dict() 
grouped_Series_Interest = x_gb["Interest %"].apply(list).to_dict() 
for k in out.keys(): 
    d["Item_Material"]=out[k]["Item_Material"] 
    d["ID"]=out[k]["Product"] 
    d["Owners"]["Owner"]= grouped_Series_Owner[(out[k]["Product"], out[k]["Item_Material"])] 
    d["Owners"]["Interest %"]= grouped_Series_Interest[(out[k]["Product"], out[k]["Item_Material"])] 
    print(d)