2013-04-09 78 views
2

我想对单个节点上的Cassandra实例(v1.1.10)执行简单的写入操作。我只是想看看它如何处理常量写入,以及它是否能跟上写入速度。如何处理AllServersUnavailable异常

pool = ConnectionPool('testdb') 
test_cf = ColumnFamily(pool,'test') 
test2_cf = ColumnFamily(pool,'test2') 
test3_cf = ColumnFamily(pool,'test3') 
test_batch = test_cf.batch(queue_size=1000) 
test2_batch = test2_cf.batch(queue_size=1000) 
test3_batch = test3_cf.batch(queue_size=1000) 

chars=string.ascii_uppercase 
counter = 0 
while True: 
    counter += 1 
    uid = uuid.uuid1() 
    junk = ''.join(random.choice(chars) for x in range(50)) 
    test_batch.insert(uid, {'junk':junk}) 
    test2_batch.insert(uid, {'junk':junk}) 
    test3_batch.insert(uid, {'junk':junk}) 
    sys.stdout.write(str(counter)+'\n') 

pool.dispose() 

代码保持长写(当计数器为10M左右+)粉碎后,出现以下消息

pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out

我设置queue_size=100这并没有帮助。我也发射了cqlsh -3控制台截断表脚本后坠毁,并得到了以下错误:

Unable to complete request: one or more nodes were unavailable.

尾矿/var/log/cassandra/system.log没有给出错误的迹象,但信息上压实,FlushWriter等。我究竟做错了什么?

+0

你看见那个节点上过多的CPU或磁盘使用情况?可能JVM垃圾回收处理不好,尽管我希望日志能够显示相关内容。 – 2013-04-15 23:09:14

回答

0

我也有过这个问题 - 正如@ tyler-hobbs在他的评论中提出的那样,节点可能超载(这是为了我)。我用过的一个简单的解决方法是退后,让节点赶上。我已经重写了上面的循环来捕捉错误,睡一会儿再试一次。我已经针对单个节点集群运行了这个工具,它可以处理暂停(一分钟)和周期性退出(连续不超过5次)。使用这个脚本不会丢失任何数据,除非错误连续五次抛出(在这种情况下,您可能想要努力失败而不是返回循环)。

while True: 
    counter += 1 
    uid = uuid.uuid1() 
    junk = ''.join(random.choice(chars) for x in range(50)) 
    tryCount = 5 # 5 is probably unnecessarily high 
    while tryCount > 0: 
    try: 
     test_batch.insert(uid, {'junk':junk}) 
     test2_batch.insert(uid, {'junk':junk}) 
     test3_batch.insert(uid, {'junk':junk}) 
     tryCount = -1 
    except pycassa.pool.AllServersUnavailable as e: 
     print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down" 
     time.sleep(60) # A delay of 60s is probably unnecessarily high 
     tryCount = tryCount - 1 
    sys.stdout.write(str(counter)+'\n') 

我添加a complete gist here