2016-12-07 102 views
1

在遍历熊猫数据框并进行更改时,更新数据框的更好方法是什么?下面一个例子,现在我使用的索引ix定位行更新,我相信不是,特别是如果数据是大的最好方法:更新熊猫数据框并更新数据帧的更好方法

print df 

id | A  | B 
01 | 374 | 2014-02-01 04:45:04.401502 
02 | 284 | 2014-03-12 21:23:12.401502 
03 | 183 | 2014-02-01 09:12:08.401502 

for row in df.itertuples(): 
    id = row[1] 
    col_a = row[2] 
    col_b = row[3] 

    N = random.randint(2,5) 
    for i in xrange(0, N): 
     new_col_a = col_a + 1 
     new_col_b = datetime.datetime.now() 

     #update dataframe's A, B respectively 
     df.ix[df['id'] == id, ['A', 'B']] = [col_a, col_b] 


print df 

id | A  | B 
01 | 374 | 2014-02-01 04:45:04.401502 
01 | 375 | 2016-12-07 07:45:04.401502 
01 | 376 | 2016-12-07 07:45:04.401502 
01 | 377 | 2014-12-07 07:45:04.401502 
02 | 284 | 2014-03-12 21:23:12.401502 
02 | 285 | 2016-12-07 07:45:04.401502 
02 | 286 | 2016-12-07 07:45:04.401502 
03 | 183 | 2014-02-01 09:12:08.401502 
03 | 184 | 2016-12-07 07:45:04.401502 
03 | 185 | 2016-12-07 07:45:04.401502 
03 | 186 | 2016-12-07 07:45:04.401502 
+0

看来你只是想改变列名 –

回答

1

不怎么样的解决方案,因为循环。

所以第一applydf自定义函数,在创建新的DataFrame并追加它列出dfs的每一行。然后concat它和应用功能,它可以工作出自定义功能:

np.random.seed(10) 
dfs = [] 
def expand(x): 
    N = np.random.choice([2,3,4]) 
    df = pd.DataFrame([x.values.tolist()], columns=x.index).reindex(range(N)) 
    df.A = df.A.fillna(1).cumsum() 
    df.insert(1,'prevA', df.A.shift()) 
    dfs.append(df) 

df.apply(expand, axis=1) 

df1 = pd.concat(dfs, ignore_index=True) 
df1.A = df1.A.astype(int) 
df1.id = df1.id.ffill().astype(int) 
df1.prevA = df1.prevA.bfill().astype(int) 
df1.B = df1.B.fillna(pd.datetime.now()) 

print (df1) 

    id prevA A       B 
0 1 374 374 2014-02-01 04:45:04.401502 
1 1 374 375 2016-12-07 10:48:14.299336 
2 1 375 376 2016-12-07 10:48:14.299336 
0 2 284 284 2014-03-12 21:23:12.401502 
1 2 284 285 2016-12-07 10:48:14.299336 
2 2 285 286 2016-12-07 10:48:14.299336 
0 3 183 183 2014-02-01 09:12:08.401502 
1 3 183 184 2016-12-07 10:48:14.299336