0
我有这样Python:如何计算pandas数据框中对之间的协作?
df = pd.DataFrame({'Item':['A','A','A','B','B','C','C','C','C'],
'Name':[Tom,John,Paul,Tom,Frank,Tom, John, Richard, James],
'Weight:[2,2,2,3,3,5, 5, 5, 5]'})
df
Item Name Weight
A Tom 4
A John 4
A Paul 4
B Tom 3
B Frank 3
C Tom 5
C John 5
C Richard 5
C James 5
对于每个人,我想与同一项目的人的名单平均在weight
df1
Name People Times
Tom [John, Paul, Frank, Richard, James] [(1/4+1/5),1/4,1/3,1/5,1/5]
John [Tom, Richard, James] [(1/4+1/5),1/5,1/5]
Paul [Tom, John] [1/4,1/4]
Frank [Tom] [1/3]
Richard [Tom, John, James] [1/5,1/5,1/5]
James [Tom, John, Richard] [1/5,1/5,1/5]
为了计算协作的时间,而不考虑的一个数据帧weight
,我所做的:
#merge M:N by column Item
df1 = pd.merge(df, df, on=['Item'])
#remove duplicity - column Name_x == Name_y
df1 = df1[~(df1['Name_x'] == df1['Name_y'])]
#print df1
#create lists
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index()
print df1
Name_x Name_y
0 Frank [Tom]
1 James [Tom, John, Richard]
2 John [Tom, Paul, Tom, Richard, James]
3 Paul [Tom, John]
4 Richard [Tom, John, James]
5 Tom [John, Paul, Frank, John, Richard, James]
#get count by np.unique
df1['People'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[0])
df1['times'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[1])
#remove column Name_y
df1 = df1.drop('Name_y', axis=1).rename(columns={'Name_x':'Name'})
print df1
Name People times
0 Frank [Tom] [1]
1 James [John, Richard, Tom] [1, 1, 1]
2 John [James, Paul, Richard, Tom] [1, 1, 1, 2]
3 Paul [John, Tom] [1, 1]
4 Richard [James, John, Tom] [1, 1, 1]
5 Tom [Frank, James, John, Paul, Richard] [1, 1, 2, 1, 1]
在过去的数据帧我有科拉的计数所有对之间硼化,但是我想他们的合作
这是我想要的但我收到以下错误 – emax
对不起,我跳过了一步,看到更新。 – Stefan
太棒了!!!!!! – emax