1
我有SQL statment这样的:Python的大熊猫:在AGG功能case语句
select id
, avg(case when rate=1 then rate end) as "P_Rate"
, stddev(case when rate=1 then rate end) as "std P_Rate",
, avg(case when f_rate = 1 then f_rate else 0 end) as "A_Rate"
, stddev(case when f_rate = 1 then f_rate else 0 end) as "std A_Rate"
from (
select id, connected_date,payment_type,acc_type,
max(case when is s_rate > 1 then 1 else 0 end)/count(open) as rate
sum(case when is hire_days <= 5 and paid>1000 then 1 else 0 end)/count(open) as f_rate
from analysis_table where alloc_date <= '2016-01-01' group by 1,2
) a group by id
我试图用熊猫改写: 起初我将创建 “内部” 表数据框:
filtered_data = data.where(data['alloc_date'] <= analysis_date)
然后我就这组数据
grouped = filtered_data.groupby(['id','connected_date'])
但我必须使用用于过滤每一列使用最大/总和就可以了。
我想是这样的:
`def my_agg_function(hire_days,paid,open):
r_arr = []
if hire_days <= 5 and paid > 1000:
r_arr.append(1)
else:
r.append(0)
return np.max(r_arr)/len(????)
inner_table['f_rate'] = grouped.agg(lambda row: my_agg_function(row['hire_days'],row['paid'],row['open'])`
和速度
好吧让我们看看点击次数看起来像(.023,1.2,0.4,2.4,2.1,.1,2),并且U想要计算总和但不像(.023 +1,2等),但是如果number_of_clicks <1 then 0 else 1 and after this calculation sum(1 + 1 + 0 + 1 ..) – gostin
然后在groupby之前做类似下面的事情:'df ['number_of_clicks'] = df ['number_of_clicks']> = 1' 。你会得到boolean的'Series'(它也是0和1到python),groupby中的和会给你你想要的。 – ysearka