2016-11-07 196 views
-1

我想下面的数据加载到我pandasdataframe熊猫 - 空数据帧

jsons_data = pd.DataFrame(columns=['playlist', 'user', 'track', 'count']) 

for index, js in enumerate(json_files): 
    with open(os.path.join(path_to_json, js)) as json_file: 
    json_text = json.load(json_file) 
    #my json layout 
    user = json_text.keys() 
    playlist = 'all_playlists' 
    track = [p for p in json_text.values()[0]] 
    count = [p.values() for p in json_text.values()] 
    print jsons_data 

,但我得到一个empty dataframe

[u'user1'] 
all_playlists 
[{u'Make You Feel My Love': 1.0, u'I See Fire': 1.0, u'High And Dry': 1.0, u'Fake Plastic Trees': 1.0, u'One': 1.0, u'Goodbye My Lover': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
[u'user2'] 
all_playlists 
[{u'Codex': 1.0, u'No Surprises': 1.0, u'O': 1.0, u'Go It Alone': 1.0}] 
[[1.0, 1.0, 1.0, 1.0]] 
[u'user3'] 
all_playlists 
[{u'Fake Plastic Trees': 1.0, u'High And Dry': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0]] 
[u'user4'] 
all_playlists 
[{u'No Distance Left To Run': 1.0, u'Running Up That Hill': 1.0, u'Fake Plastic Trees': 1.0, u'The Numbers': 1.0, u'No Surprises': 1.0}] 
[[1.0, 1.0, 1.0, 1.0, 1.0]] 
[u'user5'] 
all_playlists 
[{u'Wild Wood': 1.0, u'You Do Something To Me': 1.0, u'Reprise': 1.0}] 
[[1.0, 1.0, 1.0]] 
Empty DataFrame 
Columns: [playlist, user, track, count] 
Index: [] 

什么是错的代码?

编辑:

json文件都以这种方式构成:

{ 
'user1':{ 
'Karma Police':1.0, 
'Roxanne':1.0, 
'Sonnet':1.0, 
'We Will Rock You':1.0, 
}} 
+1

您初始化DataFrame时没有值和一些列名:'['playlist','user','track','count']'...您还期望什么?你永远不会触摸循环中的'DataFrame' - 它怎么可能影响它? –

+0

我不知道。我在学。也许你可以教我。 –

+0

这不是教程服务。不过,我建议你阅读'pandas' [教程](http://pandas.pydata.org/pandas-docs/stable/dsintro.html)。它应该让你立即开始运行。 –

回答

1

好吧,首先让我们通过做一些假的数据与玩将使这一问题更加容易的理解开始:

# Dummy data to play with 
data1 = { 
'user1':{ 
    'Karma Police':1.0, 
    'Roxanne':1.0, 
    'Sonnet':1.0, 
    'We Will Rock You':1.0, 
    } 
} 

data2 = { 
'user2':{ 
    'Karma Police':1.0, 
    'Creep':1.0, 
    } 
} 

让我说明这一点我们将在下面使用:

In : pd.DataFrame(data1).unstack() 

Out: 
user1 Karma Police  1.0 
     Roxanne    1.0 
     Sonnet    1.0 
     We Will Rock You 1.0 
dtype: float64 

# This is where you would normally iterate on the files 
mylist = [] 
for data in [data1, data2]: 
    # Make a dataframe then unstack, 
    # producing a series with a 2-multiindex as above 
    # And append it to the lsit 
    mylist.append(pd.DataFrame(data).unstack()) 

现在我们Concat的该名单,并做清理

merged = pd.concat(mylist) 
# Renaming to get the right column names 
merged.index.names = ['User', 'Track'] 
merged.name = 'Count' 
# Transpose to a dataframe instead of a Series 
merged = merged.to_frame() 
# Adding a new column with the same value throughout 
merged['Playlist'] = 'all_playlists' 


merged 
一点点

日期:

Output

你可以再调用reset_index如果你不喜欢这种方式。

+0

太好了,谢谢 –

0

在循环结束时,只需添加:

jsons_data.loc[index] = [playlist, user, track, count] 

它打印:

playlist     user \ 
0 decaf   [user1] 
1 decaf   [user2] 
2 decaf   [user3] 
3 decaf   [user4] 
4 decaf   [user5] 

               track \ 
0 [Make You Feel My Love, I See Fire, High And D... 
1    [Codex, No Surprises, O, Go It Alone] 
2 [Fake Plastic Trees, High And Dry, No Surprises] 
3 [No Distance Left To Run, Running Up That Hill... 
4  [Wild Wood, You Do Something To Me, Reprise] 

            count 
0 [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]] 
1     [[1.0, 1.0, 1.0, 1.0]] 
2      [[1.0, 1.0, 1.0]] 
3   [[1.0, 1.0, 1.0, 1.0, 1.0]] 
4      [[1.0, 1.0, 1.0]] 
+2

这很难与no一起工作?完全击败使用熊猫的角度 –

+0

@JulienMarrec数据分析不是最友好的环境。但看起来,一旦这个框架提升了重量,绘制数据和导出数据('SQLite'等)就非常简单。 –