matplotlib：绘图时忽略异常值

我正在绘制来自各种测试的一些数据。有时在一次测试中，我碰巧有一个异常值（比如说0.1），而其他所有值都小三个数量级。matplotlib：绘图时忽略异常值

随着matplotlib，我暗算范围[0, max_data_value]

我如何才能放大到我的数据并没有显示异常，这将弄乱在我的剧情x轴？

我应该简单地把95百分位数和范围[0, 95_percentile]在X轴上？

来源

2012-08-09 Ricky Robinson

什么样的情节？分散？直方图？ – 2012-08-09 14:39:28

我正在绘制直方图。 – 2012-08-09 14:41:53

对异常值没有单一的“最佳”测试。理想情况下，你应该包含先验信息（例如“这个参数不应该超过x因为......”）。

大多数异常值测试使用中位数绝对偏差，而不是第95百分位或其他基于方差的测量。否则，计算的方差/ stddev将严重偏离异常值。

这里有一个函数实现了一个更常见的异常值测试。

def is_outlier(points, thresh=3.5): 
    """ 
    Returns a boolean array with True if points are outliers and False 
    otherwise. 

    Parameters: 
    ----------- 
     points : An numobservations by numdimensions array of observations 
     thresh : The modified z-score to use as a threshold. Observations with 
      a modified z-score (based on the median absolute deviation) greater 
      than this value will be classified as outliers. 

    Returns: 
    -------- 
     mask : A numobservations-length boolean array. 

    References: 
    ---------- 
     Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and 
     Handle Outliers", The ASQC Basic References in Quality Control: 
     Statistical Techniques, Edward F. Mykytka, Ph.D., Editor. 
    """ 
    if len(points.shape) == 1: 
     points = points[:,None] 
    median = np.median(points, axis=0) 
    diff = np.sum((points - median)**2, axis=-1) 
    diff = np.sqrt(diff) 
    med_abs_deviation = np.median(diff) 

    modified_z_score = 0.6745 * diff/med_abs_deviation 

    return modified_z_score > thresh

由于使用它的一个例子，你会做类似如下：

import numpy as np 
import matplotlib.pyplot as plt 

# The function above... In my case it's in a local utilities module 
from sci_utilities import is_outlier 

# Generate some data 
x = np.random.random(100) 

# Append a few "bad" points 
x = np.r_[x, -3, -10, 100] 

# Keep only the "good" points 
# "~" operates as a logical not operator on boolean numpy arrays 
filtered = x[~is_outlier(x)] 

# Plot the results 
fig, (ax1, ax2) = plt.subplots(nrows=2) 

ax1.hist(x) 
ax1.set_title('Original') 

ax2.hist(filtered) 
ax2.set_title('Without Outliers') 

plt.show()

enter image description here

来源

2012-08-09 15:16:03

这是一个很好的答案（我从+1开始），但我认为'〜'不是合乎逻辑的，不是合乎逻辑的 - 在这里似乎并不重要，因为我不是100％清楚的，但在其他地方。 '〜False！= True'，但'not False == True' – 2012-11-13 13:24:05

好点！在numpy中，它被重载为逻辑运算而不是布尔数组（例如'〜np.array（False）== True'），但这不适用于其他任何情况。我应该澄清一点。（在附注中，按照惯例'not some_array'会在some_array有多个元素时产生值错误，因此在上面的例子中需要'〜'） – 2012-11-14 12:58:30

感谢你的回应 - 我实际上试过'不是'，并且得到了你预测的错误，所以我更加困惑...... – 2012-11-14 13:45:38

如果你不大惊小怪，通过乔，它提到的拒绝离群纯粹是这样做的美学原因，你可以设置你的阴谋的x轴限制：

plt.xlim(min_x_data_value,max_x_data_value)

这些值是您希望显示的限制值。

plt.ylim(min,max)也适用于在y轴上设置限制。

来源

2012-08-09 15:23:20 Jdog

但是，对于直方图，OP还需要重新计算垃圾箱。 Matplotlib使用固定的边框。放大时不会“重新绑定”。 – 2012-08-09 15:25:02

matplotlib：绘图时忽略异常值

回答

相关问题