2012-07-07 64 views
1

此代码有效。但我不禁感到这是一种黑客行为,尤其是“抵消”部分。我不得不把它放在那里,因为否则删除操作中的所有索引值都会被移位一次。更好的方法来删除统计异常值比这个?

# remove outliers > devs # of std deviations 
    devs = 1 
    deletes = [] 
    for num, duration in enumerate(durations): 
     if (duration > (mean_duration + (devs * std_dev_one_test))) or \ 
      (duration < (mean_duration - (devs * std_dev_one_test))): 
      deletes.append(num) 
    offset = 0 
    for delete in deletes: 
     del durations[delete - offset] 
     del dates[delete - offset] 
     offset += 1 

想法如何使它更好?

+0

'(持续时间>(mean_duration +(开发者* std_dev_one_test)))或(持续时间<(mean_duration - (devs * std_dev_one_test)))'简化为'abs(duration-mean_duration)> devs * std_dev_one_test',而不会失去任何可读性。 – PaulMcG 2012-07-07 07:22:05

回答

4

建设成为你遍历列表饲养员的列表:

def isKeeper(duration): 
    if (duration > (mean_duration + (devs * std_dev_one_test))) or \ 
      (duration < (mean_duration - (devs * std_dev_one_test))): 
     return False 
    return True 

durations = [duration for duration in durations if isKeeper(duration)] 
1

是否从列表中删除项目并导致索引偏移并且您使用偏移量进行补偿?

如果是这样,那么只需将表格从后面删除到前面,这样删除项目时不会影响列表的其余部分。

所以开始迭代从最后一项到列表的前面。

这些所谓的问题可能会感兴趣Delete many elements of list (python)Python: Removing list element while iterating over list

另一个好,所以讨论可以在这里找到:Remove items from a list while iterating(感谢@PaulMcGuire经由意见建议)

+0

这是另一个关于这个话题的好的讨论:http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python,尤其是Alex Martelli的补充评论。 – PaulMcG 2012-07-07 07:25:25

+0

@PaulMcGuire谢谢..这是一个很好的链接,我会将它添加到我的答案,如果你不介意,以防有人跳过评论。 – Levon 2012-07-07 10:32:45

0

如果数据集很小你可以扭转你的逻辑,并保留价值而不是删除它们:

# keep value outliers < devs # of std deviations 
devs = 1 
keeps = [] 
for duration in durations: 
    if (duration <= (mean_duration + (devs * std_dev_one_test))) and \ 
     (duration >= (mean_duration - (devs * std_dev_one_test))): 
     keeps.append(duration) 
3

也许是这样的:

import numpy as np   

myList = [1,2,3,4,5,6,7,3,4,5,3,5,99] 

mean_duration = np.mean(myList) 
std_dev_one_test = np.std(myList)  

def drop_outliers(x): 
    if abs(x - mean_duration) <= std_dev_one_test: 
     return x 

myList = filter(drop_outliers, myList) 

结果:

>>> myList 
[1, 2, 3, 4, 5, 6, 7, 3, 4, 5, 3, 5] 
相关问题