2017-05-05 57 views
1

这里充满价值的NAS原题: Group by min and fill NAs with value from another column集团通过分和另一列第2部分

我有这样的数据帧:

mydf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename': 
['home', 'blah', 
'blah', 'home', 'blah', 'blah','blah','home','blah','blah'], 'startpage': 
[np.nan, np.nan, np.nan, 'home', 
'home', 'blah',np.nan,np.nan,np.nan,np.nan], 'date_time': 
[0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10]}) 

我想这个数据帧:

endingdf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename': 
['home', 'blah', 'blah', 'home', 'blah','blah','blah','home','blah','blah'], 
'startpage': [np.nan, np.nan, np.nan, 'home', 
'home','blah',np.nan,np.nan,np.nan,np.nan], 
'date_time': [0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10], 
'new_start_page':['home', 'home', 'home', 'home', 'home', 'blah', 'home', 
'home', 'home', 'home']}) 

我想要做的是按UID分组,如果startpageNULL,则使用fir st pagename的访问(min_ date_time)但只有当page_event = 0。所以如果第一个pagenamepage_event = 10那就跳过那个,直到page_event = 0

回答

1
e = mydf.page_event 
p = mydf.pagename 
s = mydf.startpage 
u = mydf.uid 
m = e.mask(e == 10).groupby(u).apply(pd.Series.first_valid_index) 

s.fillna(u.map(m).map(p), inplace=True) 

print(mydf) 

    date_time page_event pagename startpage uid 
0   0   0  home  home 1 
1   1   0  blah  home 1 
2   2   0  blah  home 1 
3   5   0  home  home 2 
4   9   0  blah  home 2 
5   1   0  blah  blah 3 
6   1   10  blah  home 4 
7   2   0  home  home 4 
8   3   0  blah  home 4 
9   4   10  blah  home 4