2015-07-12 79 views
1

我们在生产代码中经常遇到这种异常,而不会增加对Couchbase的请求数量或服务器本身的任何内存压力。 节点已经分配了30GB的RAM,并且使用量最大为3GB,但是现在每隔一段时间都会抛出此异常。桶在每个应用程序生命周期中仅打开一次,之后仅执行获取和插入操作。该连接初始化这样的:.NET SDK中的Couchbase NodeUnavailableException

Config = new ClientConfiguration() 
{ 
    Servers = serverList, 

    UseSsl = false, 
    DefaultOperationLifespan = 2500, 
    BucketConfigs = new Dictionary<string, BucketConfiguration> 
    { 
     { bucketName, new BucketConfiguration 
     { 
      BucketName = bucketName, 
      UseSsl = false, 
      DefaultOperationLifespan = 2500, 
      PoolConfiguration = new PoolConfiguration 
      { 
      MaxSize = 2000, 
      MinSize = 200, 
      SendTimeout = (int)Configuration.Config.Instance.CouchbaseConfig.Timeout 
      } 
    }} 
    } 
}; 

Cluster = new Cluster(Config); 
Bucket = Cluster.OpenBucket(); 

能否请您让我知道如果这个初始化是正确的,更重要的是什么检查Couchbase服务器上找到这个问题的原因是什么?我检查了服务器上的所有日志,但在发现这些错误时找不到任何特别的东西。

谢谢

堆栈跟踪:

System.Exception.Couchbase exception 
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get() 
at ###.API.Services.BaseService`1.SetUserID() 
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.EventsService.GetResponse() 
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.BaseService`1.Any() 
at lambda_method() 
at ServiceStack.Host.ServiceRunner`1.Execute() 
at ServiceStack.Host.ServiceRunner`1.Process() 
at ServiceStack.Host.ServiceExec`1.Execute() 
at ServiceStack.Host.ServiceRequestExec`2.Execute() 
at ServiceStack.Host.ServiceController.ManagedServiceExec() 
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f() 
at ServiceStack.Host.ServiceController.Execute() 
at ServiceStack.HostContext.ExecuteService() 
at ServiceStack.Host.RestHandler.ProcessRequestAsync() 
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest() 
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 
at System.Web.HttpApplication.ExecuteStep() 
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps() 
at System.Web.HttpApplication.BeginProcessRequestNotification() 
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
Caused by: System.Exception : Couchbase.Core.NodeUnavailableException: The node 172.31.34.105:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception. 
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get() 
at ###.API.Services.BaseService`1.SetUserID() 
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.EventsService.GetResponse() 
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.BaseService`1.Any() 
at lambda_method() 
at ServiceStack.Host.ServiceRunner`1.Execute() 
at ServiceStack.Host.ServiceRunner`1.Process() 
at ServiceStack.Host.ServiceExec`1.Execute() 
at ServiceStack.Host.ServiceRequestExec`2.Execute() 
at ServiceStack.Host.ServiceController.ManagedServiceExec() 
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f() 
at ServiceStack.Host.ServiceController.Execute() 
at ServiceStack.HostContext.ExecuteService() 
at ServiceStack.Host.RestHandler.ProcessRequestAsync() 
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest() 
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 
at System.Web.HttpApplication.ExecuteStep() 
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps() 
at System.Web.HttpApplication.BeginProcessRequestNotification() 
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
+0

您是否有堆栈跟踪? – rene

+0

嗨@rene。我现在用stacktrace更新了这个问题。谢谢 –

+1

我不是一个couchbase用户,但我希望你需要研究网络连接,所以你的客户端代码或服务器端设置不会有太大的错误,而是客户端和服务器之间的一个网络组件拒绝连接暂时。 – rene

回答

1

一个NodeUnavailableException可以返回任意数量的网络相关的问题......不过,既然你提到你在AWS上运行,这是有可能的TCP保持活动设置需要在客户端进行调整。

您的MinSize连接(200)非常大,您不太可能全部使用它们,并且它们一直闲置,直到AWS LB决定关闭它们。发生这种情况时,SDK会暂时将失败的节点(1000毫秒)置于关闭状态,然后尝试重新连接。在此期间,映射到它的任何键都将失败,并出现该异常。

该博客介绍了如何设置TCP保持有效指示时间和间隔:http://blog.couchbase.com/introducing-couchbase-.net-sdk-2.1.0-the-asynchronous-couchbase-.net-client

var config = new ClientConfiguration 
{ 
    EnableTcpKeepAlives = true, //default it true 
    TcpKeepAliveTime = 1000*60*60, //set to 60mins 
    TcpKeepAliveInterval = 5000 //KEEP ALIVE will be sent every 5 seconds after 1hr 
}; 
var cluster = new Cluster(config); 
var bucket = cluster.OpenBucket(); 

这假定您使用的版本2.1.0或更高版本的客户端。如果你不是,你可以做到这一点通过ServicePointManager:

//setting keep-alive time to 200 seconds 
ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); 

你必须设置,要不到什么AWS LB被设定为一个值(我相信这是60秒)。

你也应该可能设置你的连接池的最小值和最大值低一点,像5和10

+0

Hello @jeffrymorris。感谢您的回答,但不幸的是,您提出的更改并未解决问题。 couchbase服务器不在AWS负载平衡器下,因此不能作为源。我们也减少了连接数量,但仍然没有运气。 couchbase服务器安装在Ubuntu实例上。你知道我们是否需要修改操作系统上的任何东西? –

+0

我们已经监视了TCP连接,并且最少连接了5个连接,最多连接了20个,我们只看到3个端口被打开(https://www.dropbox.com/s/fkw0rika8a8wtv1/Screenshot%202015-07-14%2018.25 .01.png?DL = 0)。几分钟前有4个端口,当发生异常时,其中一个消失了。主要问题是他们也没有重生,当发生这种情况时,我们的数据库反应非常缓慢。你怎么看? –

+0

@RaduCotofana - 你使用的是什么版本的服务器?另外,如果启用客户端日志记录(http://docs.couchbase.com/developer/dotnet-2.1/setting-up-logging.html),则应该能够记录触发NodeUnavailableException的实际异常。缓慢的反应可能是连接超时和失败,然后重建自己......它需要15-20秒。 – jeffrymorris

0

即使问题并没有完全解决,因为我们仍然遇到超时,但在更低的价格,我们增加了性能通过使用ClusterHelper单例实例如下:

ClusterHelper.Initialize(
      new ClientConfiguration 
      { 
       Servers = serverList, 
       UseSsl = false, 
       DefaultOperationLifespan = 2500, 
       EnableTcpKeepAlives = true, 
       TcpKeepAliveTime = 1000*60*60, 
       TcpKeepAliveInterval = 5000, 
       BucketConfigs = new Dictionary<string, BucketConfiguration> 
       { 
        { 
         "default", 
         new BucketConfiguration 
         { 
          BucketName = "default", 
          UseSsl = false, 
          Password = "", 
          PoolConfiguration = new PoolConfiguration 
          { 
           MaxSize = 50, 
           MinSize = 10 
          } 
         } 
        } 
       } 
      });