积分二维核心密度估计

我有一个x,y点分布，我得到KDE到scipy.stats.gaussian_kde。这是我的代码，以及如何输出如下（在x,y数据可以从here获得）：积分二维核心密度估计

import numpy as np 
from scipy import stats 

# Obtain data from file. 
data = np.loadtxt('data.dat', unpack=True) 
m1, m2 = data[0], data[1] 
xmin, xmax = min(m1), max(m1) 
ymin, ymax = min(m2), max(m2) 

# Perform a kernel density estimate (KDE) on the data 
x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j] 
positions = np.vstack([x.ravel(), y.ravel()]) 
values = np.vstack([m1, m2]) 
kernel = stats.gaussian_kde(values) 
f = np.reshape(kernel(positions).T, x.shape) 

# Define the number that will determine the integration limits 
x1, y1 = 2.5, 1.5 

# Perform integration? 

# Plot the results: 
import matplotlib.pyplot as plt 
# Set limits 
plt.xlim(xmin,xmax) 
plt.ylim(ymin,ymax) 
# KDE density plot 
plt.imshow(np.rot90(f), cmap=plt.cm.gist_earth_r, extent=[xmin, xmax, ymin, ymax]) 
# Draw contour lines 
cset = plt.contour(x,y,f) 
plt.clabel(cset, inline=1, fontsize=10) 
plt.colorbar() 
# Plot point 
plt.scatter(x1, y1, c='r', s=35) 
plt.show()

result

坐标(x1, y1)红点已经（像2D情节的每一个点）关联由f（内核或KDE）给出的值在0和0.42之间。我们说f(x1, y1) = 0.08。

我需要在这些地区给予x和y到f与积分限制整合地方f评估为少比f(x1, y1)，即：f(x, y)<0.08。

对于我所看到python可以执行功能并通过数值积分的一个维数组的整合，但我还没有看到任何东西，会让我一个2D阵列上执行数值积分（在f内核）此外，我不知道我怎么会认识到由特定条件给出的区域（即：f(x, y)小于给定值）

这可以完成吗？

来源

2013-07-23 Gabriel

这是一种使用蒙特卡罗整合的方法。它有点慢，解决方案中存在随机性。误差与样本大小的平方根成反比，而运行时间与样本大小成正比（样本大小指的是蒙特卡洛样本（下面示例中为10000），而不是数据集的大小）。这里有一些使用你的kernel对象的简单代码。

#Compute the point below which to integrate 
iso = kernel((x1,y1)) 

#Sample from your KDE distribution 
sample = kernel.resample(size=10000) 

#Filter the sample 
insample = kernel(sample) < iso 

#The integral you want is equivalent to the probability of drawing a point 
#that gets through the filter 
integral = insample.sum()/float(insample.shape[0]) 
print integral

我得到约0.2作为您的数据集的答案。

来源

2013-07-24 00:48:47 jcrudy

令人惊讶的是简单的，我显然需要阅读一些统计数据。非常感谢你！ – Gabriel

请注意，这个蒙特卡洛实现可能不正确。看到这里：http://stackoverflow.com/a/35903712/1391441 – Gabriel

@加布里埃尔我认为这个解决方案实际上是正确的这个问题。我看了你链接到的另一个问题。这是我的想法。这里有两种不同的整合界限。在这个问题中，你很清楚地表明你想要在f（x，y） jcrudy

一个直接的办法就是integrate

import matplotlib.pyplot as plt 
import sklearn 
from scipy import integrate 
import numpy as np 

mean = [0, 0] 
cov = [[5, 0], [0, 10]] 
x, y = np.random.multivariate_normal(mean, cov, 5000).T 
plt.plot(x, y, 'o') 
plt.show() 

sample = np.array(zip(x, y)) 
kde = sklearn.neighbors.KernelDensity().fit(sample) 
def f_kde(x,y): 
    return np.exp((kde.score_samples([[x,y]]))) 

point = x1, y1 
integrate.nquad(f_kde, [[-np.inf, x1],[-np.inf, y1]])

的问题是，如果你在一个大型做到这一点，这是非常缓慢的。例如，如果要在x（0,100）处绘制x,y行，则计算需要很长时间。

注意：我用kde从sklearn，但我相信你也可以将它改成其他形式。

使用原来的问题定义为kernel：

import numpy as np 
from scipy import stats 
from scipy import integrate 

def integ_func(kde, x1, y1): 

    def f_kde(x, y): 
     return kde((x, y)) 

    integ = integrate.nquad(f_kde, [[-np.inf, x1], [-np.inf, y1]]) 

    return integ 

# Obtain data from file. 
data = np.loadtxt('data.dat', unpack=True) 
# Perform a kernel density estimate (KDE) on the data 
kernel = stats.gaussian_kde(data) 

# Define the number that will determine the integration limits 
x1, y1 = 2.5, 1.5 
print integ_func(kernel, x1, y1)

来源

2016-02-23 07:12:48 cqcn1991

cqcn1991我无法得到这个例子工作。你可以扩展你的代码使它能够运行吗？ – Gabriel

@Gabriel我改变了一个完整的例子，但省略了一些'import'。 python中的'import'对我来说只是一场灾难。 – cqcn1991

'KernelDensity'没有正确导入，你应该使用：'from sklearn.neighbors import KernelDensity'。你怎么定义'inf'？我的代码中出现'NameError：name'inf'未定义'。另外，应该用作积分极限'（x1，y1）'的点在哪里？ – Gabriel

积分二维核心密度估计

回答

相关问题