import pandas as pd
import numpy as np
pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}
data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}
pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)
我知道我可以使用类似VLOOKUP针对与普通指数系列。我想把这个回报加起来,并用每个成本乘以一个通用指数来产生一个名为价格的新列。大熊猫使用地图
我知道我可以将两者合并,然后运行计算。这就是我产生所需输出的方式。我希望能够做到这一点,类似于如何循环访问字典,并使用键在另一个字典中查找值并在循环中执行某种计算。考虑到PANDAS数据框位于字典之上,必须有一种使用join/map/apply的组合来实现这一点,而无需实际将两个数据集合在内存中。
所需的输出:
desired_output = {"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48},"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73"},"mark_up_id":{"0":"123","1":"456","2":"111","3":"123","4":"789"},"price":{"0":38.623338,"1":14.71875,"2":12.559686,"3":12.233754,"4":12.985704}}
do = pd.DataFrame(data=desired_output)
积分:
解释接受的答案和...
pb.loc[df['mark_up_id']]['mark_up'] * df.set_index('mark_up_id')['cost']
,为什么我得到的上述下面的lambda函数的区别从命中错误...
df.apply(lambda x : x['cost']*pb.loc[x['mark_up_id']],axis=1)
返回一个错误说:
KeyError: ('the label [333] is not in the [index]', u'occurred at index 5')
只有在乘以两个相同长度的序列对象时,这才起作用吗?如果指标不同+一系列更长。 –
地图会将df中的mark_up_id值映射到pb中的str_price_band,并返回您按价格乘以相应的mark_up值。所以长度不必相同 – Vaishali
如果你正在处理df中的mark_up_id,而pb中不存在mark_up_id,那么显然它将无法找到相应的mark_up并返回NaN。 – Vaishali