2017-03-02 57 views
0

我有一个pandas.dataframe,我想通过一些规则选择某些数据。Python3,与pandas.dataframe,如何通过一些规则选择某些数据来显示

下面的代码生成数据帧

import datetime 
import pandas as pd 
import numpy as np 

today = datetime.date.today() 
dates = list() 
for k in range(10): 
    a_day = today - datetime.timedelta(days=k) 
    dates.append(np.datetime64(a_day)) 

np.random.seed(5) 
df = pd.DataFrame(np.random.randint(100, size=(10, 3)), 
        columns=('other1', 'actual', 'other2'), 
        index=['{}'.format(i) for i in range(10)]) 

df.insert(0, 'dates', dates) 
df['err_m'] = np.random.rand(10, 1)*0.1 
df['std'] = np.random.rand(10, 1)*0.05 
df['gain'] = np.random.rand(10, 1) 

现在,我想通过以下规则进行选择:

1. compute the sum of 'err_m' and 'std', then sort the df so that the sum is descending 
2. from the result of step 1, select the part where 'actual' is > 50  

感谢

回答

1
  1. 创建一个新的列,然后排序这一个:

    df['errsum'] = df['err_m'] + df['std'] 
    # Return a sorted dataframe 
    df_sorted = df.sort('errsum', ascending = False) 
    
  2. 选择要

    # Create an array with True where the condition is met 
    selector = df_sorted['errsum'] > 50 
    # Return a view of sorted_dataframe with only the lines you want 
    df_sorted[selector] 
    
+0

感谢线条。它解决了我的问题 – aura