关于从书籍数据科学模拟中心极限定理的问题从零开始

我正在阅读由Joel Grus从头开始的“数据科学”一书。我的问题特别涉及第6章，作者使用二项式随机变量来模拟定理。关于从书籍数据科学模拟中心极限定理的问题从零开始

结果将是一个图表，其中包含二项试验的概率分布和使用正态分布的近似图。这两个地块应该非常相似。书示出像这样的图表： Author's Chart

他提供的代码为：

import random 
from matplotlib import pyplot as plt 
from collections import Counter 

def bernoulli_trial(p): 
    return 1 if random.random() < p else 0 

def binomial(n, p): 
    return sum(bernoulli_trial(p) for _ in range(n)) 

def make_hist(p, n, num_points): 
    data = [binomial(n, p) for _ in range(num_points)] 
    histogram = Counter(data) 
    plt.bar([x-0.4 for x in histogram.keys()], 
     [v/num_points for v in histogram.values()], 
     0.8, 
     color='0.75') 

    mu = p * n 
    sigma = math.sqrt(n * p * (1-p)) 

    # use a line chart to show the normal approximation 
    xs = range(min(data), max(data) + 1) 
    ys = [normal_cdf(i+0.5, mu, sigma) - normal_cdf(i-0.5, mu, sigma) for i in xs] 
    plt.plot(xs, ys) 
    plt.title('Binomial Distribution vs. Normal Approximation') 
    plt.show() 

make_hist(0.75, 100, 10000)

我的问题是，在这一行： [normal_cdf（I + 0.5，μ，西格马） - normal_cdf（ i-0.5，mu，sigma）for i in xs] 作者为什么使用+0.5和-0.5？这是否有特定的原因？

不知道是否有人遇到过这个问题。预先感谢您！

来源

2017-10-29 Tian Park

normal_cdf超过在'范围（-50,50）'的载体'X/10.0'。对？ – skrubber

是的。函数是：def normal_cdf（x，mu = 0，sigma = 1）：return（1 + math.erf（（x-mu）/math.sqrt（2）/ sigma））/ 2。 x轴为：xs = [x/10.0（对于范围内的x（-50,50）]。为什么要除以10？ –

我想只是将图表缩小到x轴-5到5之间？ –

在xs变量中，您有一个步骤1的X坐标列表，例如， [-5,6,7,8,9,10-]。在ys变量中，您需要获得相应的Y坐标，并且代码中的normal_cdf(i+0.5, mu, sigma) - normal_cdf(i-0.5, mu, sigma)是从i-0.5到i + 0.5的积分，即（i + 0.5） - （i-0.5）= 1的宽度是相同的步骤。

更易于理解代码看起来是这样的：在书中定义

step = 1.0 
xs = range(min(data), max(data) + 1, step) 
ys = [normal_cdf(i + step/2, mu, sigma) - normal_cdf(i - step/2, mu, sigma) for i in xs]

来源

2017-10-29 10:56:38

我明白了！因此示例中的范围（min（data），max（data）+ 1）实际上意味着该步骤为1。 –

关于从书籍数据科学模拟中心极限定理的问题从零开始

回答

相关问题