Aerospike错误：所有批次队列已满

我在Google Cloud中运行Aerospike群集。按照this post的建议，我更新到最新版本（3.11.1.1）并重新创建所有服务器。事实上，这种变化导致我5个服务器在低得多的CPU负荷工作（之前是75％左右的负载，现在是20％，为显示在图形波纹管：Aerospike错误：所有批次队列已满

因为。这种低负荷的，我决定簇的大小缩小到4台服务器。当我这样做，我的应用程序开始收到以下错误：

All batch queues are full

我发现这个discussion about the topic，建议改变参数batch-index-threads和batch-max-unused-buffers与

asadm -e "asinfo -v 'set-config:context=service;batch-index-threads=NEW_VALUE'"

我试图值的许多组合（分批索引线程与2,4,8,16）的命令，其中没有一个解决了这个问题，并且还改变batch-index-threads PARAM 。没有解决我的问题。我一直收到All batch queues are full错误。

这里是我的aerospace.conf相关信息：

service { 
    user root 
    group root 
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1. 
     paxos-recovery-policy auto-reset-master 
    pidfile /var/run/aerospike/asd.pid 
    service-threads 32 
    transaction-queues 32 
    transaction-threads-per-queue 4 
    batch-index-threads 40 
    proto-fd-max 15000 
    batch-max-requests 30000 
    replication-fire-and-forget true 
}

我使用这些服务器上的300GB SSD硬盘。

来源

2017-03-02 Daniel Cukier

我怀疑你正在达到磁盘IOPS限制。如果达到限制，批处理操作将花费时间在磁盘IO上。这也会降低CPU利用率，因为Aerospike无法执行大量工作。你可以通过执行启用详细信息批量基准：asadm -e“asinfo -v”set-config：context = namespace; id = test; enable-benchmarks-batch-sub = true'“。它将详细介绍批量调用大部分时间在哪里。 – sunil

快速注意这可能会或可能不会涉及到你：

A common mistake we have seen in the past is that developers decide to use 'batch get' as a general purpose 'get' for single and multiple record requests. The single record get will perform better for single record requests.

这有可能是你正在由客户端和服务器之间的网络限制。从5节点减少到4节点减少了总管。另外，删除节点将启动集群迁移，这会增加额外的网络负载。

来源

2017-03-02 18:19:15 kporter

我的客户端是1CPU服务器，将它们限制为2Gbps。从网络监控图表来看，这些服务器的运行速度为150MB（1.2Gbps），远低于上限。我的服务器速度达到8Gbps，运行速度也达到600MB（4.8Gps）。（https://cloud.google.com/compute/docs/networks-and-firewalls#egress_throughput_caps） –

我会看看batch-max-buffer-per-queue配置参数。

Maximum number of 128KB response buffers allowed in each batch index queue. If all batch index queues are full, new batch requests are rejected.

与来自255默认增加此值相结合，你会想也提高batch-max-unused-buffers到batch-index-threads X batch-max-buffer-per-queue + 1（至少）。如果你不这样做，新的缓冲区将不断创建和销毁，因为免费（未使用）缓冲区的数量比你使用的缓冲区的数量要小。批量响应一经提供，系统将努力将缓冲器调整至最大未使用数量。你会看到这反映在batch_index_created_buffers度量不断上升。

请注意，您需要为此拥有足够的DRAM。例如，如果你提高batch-max-buffer-per-queue 320你将每个节点消耗

40 (`batch-index-threads`) x 320 (`batch-max-buffer-per-queue`) x 128K = 1600MB

由于性能batch-max-unused-buffers应设置为13000，这将有1625MB（1.59GB）的最大内存消耗的缘故。

来源

2017-03-10 23:57:16

我确实将'batch-max-unused-buffers'改为21000（40 * 512）+某些缓冲区，但是问题依然存在。我仍然看到相同的错误，并且'batch_index_created_buffers'不断上升 –

这是一个迹象，表明周围没有足够的未使用的缓冲区，但我通过它为什么比'batch-index-threads' x'batch- max-bugger-per-queue' ...你可以仔细检查参数设置吗？ –

顺便说一句，这篇知识库文章将讨论这个话题：https://discuss.aerospike.com/t/batch-full-error/4329 –

Aerospike错误：所有批次队列已满

回答

相关问题