Pyplot scatterplot图例不适用于较小的样本尺寸

我正在使用下面的代码在pyplot中生成一个散点图，我希望9个类中的每一个都以不同的颜色绘制。每个班级有多个点。Pyplot scatterplot图例不适用于较小的样本尺寸

我不明白为什么图例不适用于较小的样本量。

def plot_scatter_test(x, y, c, title): 
    data = pd.DataFrame({'x': x, 'y': y, 'c': c}) 
    classes = len(np.unique(c)) 
    colors = cm.rainbow(np.linspace(0, 1, classes)) 

    ax = plt.subplot(111) 
    for s in range(0,classes): 
     ss = data[data['c']==s] 
     plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s) 

    ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, bbox_to_anchor=(0, -.4), title='Legend') 
    plt.show()

我的数据是这样的

当我通过调用

plot_scatter_test(test['x'], test['y'],test['group'])

我得到的图表中不同的颜色绘制这一点，但传说是单一颜色

所以为了确保我的数据正常，我使用相同类型的数据创建了一个随机数据帧。现在我得到了不同的颜色，但由于它们不是连续的，所以还是有些问题。

test2 = pd.DataFrame({ 
    'y': np.random.uniform(0,1400,36), 
    'x': np.random.uniform(-250,-220,36), 
    'group': np.random.randint(0,9,36) 
}) 
plot_scatter_test(test2['x'], test2['y'],test2['group'])

最后，我创建的360个数据点的更大的阴谋，一切看起来我希望它的方式。我究竟做错了什么？

test3 = pd.DataFrame({ 
    'y': np.random.uniform(0,1400,360), 
    'x': np.random.uniform(-250,-220,360), 
    'group': np.random.randint(0,9,360) 
}) 

plot_scatter_test(test3['x'], test3['y'],test3['group'])

来源

2017-04-19 ElPresidente

您尝试分配颜色的方式对我来说没有意义。你能非常精确地知道颜色应该代表什么？ – ImportanceOfBeingErnest

颜色只是每个组的任意视觉差异。我一直在仔细查看并发布它作为答案，所以我碰巧找到了一个修复程序。 – ElPresidente

这不是一个修复，请参阅我的答案。 – ImportanceOfBeingErnest

你需要确保不要混淆类本身与您用于索引号。

为了更好地观察我的意思，使用下面的数据集与功能：

np.random.seed(22) 
X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8)) 
test2 = pd.DataFrame({ 
    'y': Y.flatten(), 
    'x': X.flatten(), 
    'group': np.random.randint(0,9,len(X.flatten())) 
}) 
plot_scatter_test(test2['x'], test2['y'],test2['group'])

这会导致下面的图形，其中点丢失。

所以，赚了指数和类，例如有明显的区别从它确实是没有必要的颜色4元组提供直接c，因为这如下

import numpy as np; np.random.seed(22) 
import matplotlib.pyplot as plt 
import pandas as pd 

def plot_scatter_test(x, y, c, title="title"): 
    data = pd.DataFrame({'x': x, 'y': y, 'c': c}) 
    classes = np.unique(c) 
    print classes 
    colors = plt.cm.rainbow(np.linspace(0, 1, len(classes))) 
    print colors 
    ax = plt.subplot(111) 
    for i, clas in enumerate(classes): 
     ss = data[data['c']==clas] 
     plt.scatter(ss["x"],ss["y"],c=[colors[i]]*len(ss), label=clas) 

    ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, title='Legend') 
    plt.show() 

X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8)) 
test2 = pd.DataFrame({ 
    'y': Y.flatten(), 
    'x': X.flatten(), 
    'group': np.random.randint(0,9,len(X.flatten())) 
}) 
plot_scatter_test(test2['x'], test2['y'],test2['group'])

除了将被解释为四个单颜色。

来源

2017-04-19 19:09:15 ImportanceOfBeingErnest

-1

我在这盯着一段时间后现在感觉很傻。错误是在颜色被传递。我正在向.scatter函数传递一个颜色。但是，由于有多个点，您需要传递相同数量的颜色。因此

plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s)

可以是这样的

plt.scatter(x=ss['x'], y=ss['y'],c=[colors[s]]*len(ss), label=s)

来源

2017-04-19 18:22:04 ElPresidente

小心，这还是没有意义，因为你正在混合类与唯一类的索引。 – ImportanceOfBeingErnest

我明白你的意思了。在我的数据中，我总是有从0到N连续的一系列组，所以我只是迭代索引。它在您的示例中失败，因为并非所有组都可能存在。您的答案是更完整的解决方案。谢谢。 – ElPresidente

Pyplot scatterplot图例不适用于较小的样本尺寸

回答

相关问题