Pandas DataFrame：根据对现有列的值检查将值写入列

我想将一列添加到pd.DataFrame，其中我根据现有列中的检查来编写值。Pandas DataFrame：根据对现有列的值检查将值写入列

我想检查字典中的值。比方说，我有以下的解释：

{"<=4":[0,4], "(4,10]":[4,10], ">10":[10,inf]}

现在我想在一列来检查我的数据帧，如果在此列中的值属于任何字典中的时间间隔。如果是这样，我想将匹配的字典键写入同一数据框中的第二列。

因此，像一个数据帧：

 col_1 
    a 3 
    b 15 
    c 8

将变为：

 col_1 col_2 
    a 3  "<=4" 
    b 15 ">10" 
    c 8  "(4,10]"

来源

2015-12-02 farnold

希望以下帮助。 –

的pd.cut()函数用于连续变量转换为分类变量，在这种情况下，我们有[0 , 4 , 10 , np.inf]，这意味着我们有3类[0 , 4]，[4 , 10]，[10 , inf]，所以0和4之间的任何值将被分配给[ 0 , 4]类别，并且4和10之间的任何值将被分配到类别[ 4 , 10 ]等等。

，那么你在相同的顺序为每个类别指定一个名称，你可以通过使用标签参数做到这一点，在这种情况下，我们有3个类别[0 , 4]，[4 , 10]，[10 , inf]，只是我们将指派['<=4' , '(4,10]' , '>10']的标签参数，这意味着[0 , 4]类别将被命名为<=4，并且[4 , 10]类别将被命名为(4,10]等等。

In [83]: 
df['col_2'] = pd.cut(df.col_1 , [0 , 4 , 10 , np.inf] , labels = ['<=4' , '(4,10]' , '>10']) 
df 
Out[83]: 
    col_1 col_2 
0 3  <=4 
1 15  >10 
2 8  (4,10]

来源

2015-12-03 09:25:51

想一想[解释你的解决方案]（http://stackoverflow.com/help/how-to-answer）？ –

您可以使用此方法：

dico = pd.DataFrame({"<=4":[0,4], "(4,10]":[4,10], ">10":[10,float('inf')]}).transpose() 

foo = lambda x: dico.index[(dico[1]>x) & (dico[0]<=x)][0] 

df['col_1'].map(foo) 

#0  <=4 
#1  >10 
#2 (4,10] 
#Name: col1, dtype: object

来源

2015-12-02 17:19:26

该解决方案创建一个名为extract_str功能，适用于col_1。它使用条件列表理解来遍历字典中的键和值，检查该值是否大于或等于较低值且小于较高值。进行检查以确保该结果列表不包含多个结果。如果列表中有值，则返回。否则默认返回None。

from numpy import inf 

d = {"<=4": [0, 4], "(4,10]": [4, 10], ">10": [10, inf]} 

def extract_str(val): 
    results = [key for key, value_range in d.iteritems() 
       if value_range[0] <= val < value_range[1]] 
    if len(results) > 1: 
     raise ValueError('Multiple ranges satisfied.') 
    if results: 
     return results[0] 

df['col_2'] = df.col_1.apply(extract_str) 

>>> df 
    col_1 col_2 
a  3  <=4 
b  15  >10 
c  8 (4,10]

在这个小数据框上，这个解决方案比@ColonelBeauvel提供的解决方案快得多。

%timeit df['col_2'] = df.col_1.apply(extract_str) 
1000 loops, best of 3: 220 µs per loop 

%timeit df['col_2'] = df['col_1'].map(foo) 
1000 loops, best of 3: 1.46 ms per loop

来源

2015-12-02 17:19:43 Alexander

谢谢你的回答！我发现@Nader Hisham的答案对于原始问题的解决方案会更加优雅一些。然而，你的答案帮助我很多与比较DataFrame列与字典（like）对象的另一个问题！ – farnold

您可以使用函数进行映射。就像这个例子。我希望它可以帮助你。

import pandas as pd 
d = {'col_1':[3,15,8]} 
from numpy import inf 
test = pd.DataFrame(d,index=['a','b','c']) 
newdict = {"<=4":[0,4], "(4,10]":[4,10], ">10":[10,inf]} 

def mapDict(num): 
    print(num) 
    for key,value in newdict.items(): 
     tmp0 = value[0] 
     tmp1 = value[1] 
     if num == 0: 
      return "<=4" 
     elif (num> tmp0) & (num<=tmp1): 
      return key 

test['col_2']=test.col_1.map(mapDict)

然后测试将成为：

col_1 col_2 
a 3 <=4 
b 15 >10 
c 8 (4,10]

PS。我想知道如何在堆栈溢出中快速编码，有没有人可以告诉我这些技巧？

来源

2015-12-02 17:29:39

Pandas DataFrame：根据对现有列的值检查将值写入列

回答

相关问题