计算熊猫数据框中出现的列表

我正在尝试使用matplotlib创建元素频率的条形图。为了做到这一点，我需要能够统计一个pandas数据框列中出现的标志列表数量。下面会给代码的草图我已经在我的笔记本/数据：计算熊猫数据框中出现的列表

# list of filtered values 
    filtered = [200, 201, 201, 201, 201, 201, 
    211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 211, 
    237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
    237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
    237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250, 
    250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
    250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
    250, 250, 250, 250, 254] 

    # list of flags to use for filtering 
    flags = [200, 201, 211, 237, 239, 250, 254, 255] 
    # this was just a line to code for testing 
    flags_dict = {200:0,201:0,211:0,237:0,239:0,250:0,254:0,255:0} 

    freq = filtered.value_counts() 


    """ 
    Expected flags_dict: 
    200: 1 
    201: 5 
    211: 14 
    237: 38 
    239: 0 
    250: 40 
    254: 1 
    255: 0 
    """ 

    """ 
    These are the values from the real dataframe but they do not take into 
    account the other flags in the flags list 
    freq: 
    250.0 7682 
    211.0 3734 
    200.0 1483 
    239.0  180 
    201.0  34  
    """

来源

2016-12-05 KatieRose1029

我刚才用这个来了，但必须有完成这个

 #column_data is a list created from a pandas Dataframe column 
     column_data = list(filtered['C5 Terra']) 
     flags_dict[200] = column_data.count(200) 
     flags_dict[201] = column_data.count(201) 
     flags_dict[211] = column_data.count(211) 
     flags_dict[237] = column_data.count(237) 
     flags_dict[239] = column_data.count(239) 
     flags_dict[250] = column_data.count(250) 
     flags_dict[254] = column_data.count(254) 
     flags_dict[255] = column_data.count(255) 
     flags_dict

更好/更快的方法

来源

2016-12-05 22:18:35 KatieRose1029

如果我理解正确的，这是你所需要的：

import pandas as pd 

filtered = [200, 201, 201, 201, 201, 201, 211, 211, 211, 211, 211, 211, 211, 211, 211, 
      211, 211, 211, 211, 211, 
      237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
      237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 237, 
      237, 237, 237, 237, 237, 237, 237, 237, 250, 250, 250, 250, 250, 250, 250, 
      250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
      250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 250, 
      250, 250, 250, 250, 254] 


filtered = pd.Series(filtered) 

freq = filtered.value_counts(sort=False) 
flags = [200, 201, 211, 237, 239, 250, 254, 255] 
flags_dict = {} 
for flag in flags: 
    try: 
     flags_dict[flag] = freq[flag] 
    except: 
     flags_dict[flag] = 0

来源

2016-12-05 22:19:21 zipa

这可以用isin

是非常简单的回答

假设filtered是一个系列。

In [1]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0) 
Out[1]: 200  1 
     201  5 
     211 14 
     237 38 
     239  0 
     250 41 
     254  1 
     255  0 
     dtype: int64

为了得到一本字典只需添加to_dict

In [2]: filtered[filtered.isin(flags)].value_counts().reindex(flags, fill_value=0).to_dict() 

Out[2]: {200: 1, 201: 5, 211: 14, 237: 38, 239: 0, 250: 41, 254: 1, 255: 0}

来源

2016-12-06 00:06:55

计算熊猫数据框中出现的列表

回答

相关问题