gpt4 book ai didi

appfabric - 解决AppFabric缩放问题(间歇性ErrorCode :SubStatus 错误)

转载 作者:行者123 更新时间:2023-12-03 13:36:35 26 4
gpt4 key购买 nike

我们已经为我们的Web应用程序实现了AppFabric Windows Server缓存。最初,我们能够毫无问题地使用缓存。然后,我们将流量增加了大约100倍,并开始遇到间歇性异常。异常大约每2天发生一次,一次大约一分钟。

我们的配置:


9个Web服务器在缓存中插入/检索对象:


主要是临时的500字节操作类型对象
使用1个命名区域
使用标签存储的对象
批量检索给定标签

缓存集群:


1个主机(潜在客户)AppFabric 1.1(get-cachehost报告的版本为3)
SQL配置提供程序
主机上96GB的RAM,默认50%(48GB)分配给AppFabric
缓存主机Config
缓存客户端Config



错误的发生顺序(在1分钟内,九个Web服务器中的每一个均发生异常):


System.Net.Sockets.SocketException:现有连接被远程主机强行关闭
Microsoft.ApplicationServer.Caching.DataCacheException:ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated, possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown. ---> System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:15:00'. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
--- End of inner exception stack trace ---
at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Channels.FramingDuplexSessionChannel.EndReceive(IAsyncResult result)
at Microsoft.ApplicationServer.Caching.WcfClientChannel.CompleteProcessing(IAsyncResult result)
--- End of inner exception stack trace ---
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody)
at Microsoft.ApplicationServer.Caching.DataCache.GetNextBatch(String region, DataCacheTag[] tags, GetByTagsOperation op, IMonitoringListener listener, Byte[][]& state, Boolean& more)
at Microsoft.ApplicationServer.Caching.CacheEnumerator.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator'2.MoveNext()
at System.Linq.Enumerable.<ExceptIterator>d__99'1.MoveNext()
at System.Collections.Generic.List'1..ctor(IEnumerable'1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable'1 source)

Microsoft.ApplicationServer.Caching.DataCacheException:
ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.)
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody)
at Microsoft.ApplicationServer.Caching.DataCache.GetNextBatch(String region, DataCacheTag[] tags, GetByTagsOperation op, IMonitoringListener listener, Byte[][]& state, Boolean& more)
at Microsoft.ApplicationServer.Caching.CacheEnumerator.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator'2.MoveNext()
at System.Linq.Enumerable.<ExceptIterator>d__99'1.MoveNext()
at System.Collections.Generic.List'1..ctor(IEnumerable'1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable'1 source)

Microsoft.ApplicationServer.Caching.DataCacheException:
ErrorCode<ERRCA0018>:SubStatus<ES0001>:The request timed out.
at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody)
at Microsoft.ApplicationServer.Caching.DataCache.GetNextBatch(String region, DataCacheTag[] tags, GetByTagsOperation op, IMonitoringListener listener, Byte[][]& state, Boolean& more)
at Microsoft.ApplicationServer.Caching.CacheEnumerator.MoveNext()
at System.Linq.Enumerable.WhereSelectEnumerableIterator'2.MoveNext()
at System.Linq.Enumerable.<ExceptIterator>d__99'1.MoveNext()
at System.Collections.Generic.List'1..ctor(IEnumerable'1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable'1 source)



我们还在缓存服务器上创建了一个tracelog会话,以捕获更多信息来诊断问题-有关如何分析此问题的任何建议将不胜感激(如有需要,我可以提供此建议)。

我们还监视了各种AppFabric,CLR和网络性能计数器,以下是事件发生时的屏幕截图:



预先感谢您提出的解决此问题的任何想法或建议。

更新1

以下是间歇性错误(从跟踪日志中提取)期间AppFabric缓存服务器上连续发生的异常:


System.ServiceModel.CommunicationException: The socket connection was aborted because an asynchronous send to the socket did not complete within the allotted timeout of 00:00:00.0082078. The time allotted to this operation may have been a portion of a longer timeout. ---> System.ObjectDisposedException: The socket connection has been disposed. Object name: 'System.ServiceModel.Channels.SocketConnection'. --- End of inner exception stack trace --- at System.ServiceModel.Channels.SocketConnection.ThrowIfNotOpen() at System.ServiceModel.Channels.SocketConnection.BeginRead(Int32 offset, Int32 size, TimeSpan timeout, WaitCallback callback, Object state) at System.ServiceModel.Channels.SessionConnectionReader.BeginReceive(TimeSpan timeout, WaitCallback callback, Object state) at System.ServiceModel.Channels.SynchronizedMessageSource.ReceiveAsyncResult.PerformOperation(TimeSpan timeout) at System.ServiceModel.Channels.SynchronizedMessageSource.SynchronizedAsyncResult'1..ctor(SynchronizedMessageSource syncSource, TimeSpan timeout, AsyncCallback callback, Object state) at System.ServiceModel.Channels.FramingDuplexSessionChannel.BeginReceive(TimeSpan timeout, AsyncCallback callback, Object state) at Microsoft.ApplicationServer.Caching.WcfServerChannel.CompleteProcessing(IAsyncResult result)
System.ServiceModel.CommunicationObjectAbortedException: The communication object, System.ServiceModel.Channels.ServerSessionPreambleConnectionReader+ServerFramingDuplexSessionChannel, cannot be used for communication because it has been Aborted. at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result) at System.ServiceModel.Channels.FramingDuplexSessionChannel.OnEndSend(IAsyncResult result) at Microsoft.ApplicationServer.Caching.ReplyContext.EndSend(IAsyncResult result)
System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServerSessionPreambleConnectionReader+ServerFramingDuplexSessionChannel, cannot be used for communication because it is in the Faulted state. at System.ServiceModel.Channels.CommunicationObject.ThrowIfDisposedOrNotOpen() at System.ServiceModel.Channels.OutputChannel.Send(Message message, TimeSpan timeout) at Microsoft.ApplicationServer.Caching.ReplyContext.Reply(Message message, TimeSpan timeout)
System.TimeoutException: Sending to via http://www.w3.org/2005/08/addressing/anonymous timed out after 00:00:15. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: Cannot claim lock within the allotted timeout of 00:00:15. The time allotted to this operation may have been a portion of a longer timeout. --- End of inner exception stack trace --- at System.ServiceModel.Channels.FramingDuplexSessionChannel.OnSend(Message message, TimeSpan timeout) at System.ServiceModel.Channels.OutputChannel.Send(Message message, TimeSpan timeout) at Microsoft.ApplicationServer.Caching.ReplyContext.Reply(Message message, TimeSpan timeout)


更新2

经过一天的故障排除后,我们采取了以下措施,这些措施有所改进:


基于thisthis,我们将maxConnectionsToServer增加到3。结果,如AppFabric Caching:Cache性能计数器所记录的,我们每秒获得了50%以上的客户端请求,但间歇性错误并未停止发生
在Cache Server配置上,我们将maxBufferSizemaxBufferPoolSize增加到2147483647(int32.max)。到目前为止,我们已经能够处理300倍的流量而没有出现错误。我们将继续增加流量并进行监控。更多更新


更新3

我们向群集添加了另外两个主机,每个主机具有16GB的空间,并启用了HighAvailability模式(通过Secondaries=1)。当前,原始主机保留在具有96GB内存的群集中-所有主机都具有cacheSize = 12 GB。在缓存客户端上,我们将MaxConnectionToServer增加到12(每个内核1个)。以下是我们的发现:


有时我们得到(每10分钟一次或两次):


ErrorCode<ERRCA0017>:SubStatus<ES0005>:There is a temporary failure. Please retry later. (There was a contention on the store.)
ErrorCode<ERRCA0017>:SubStatus<ES0004>:There is a temporary failure. Please retry later. (Replication queue was full. This may happen during reconfiguration of cluster hosts.)

如上所述,原始的96​​GB缓存主机仍然会遇到1分钟的中断。新的缓存主机尚未发生中断


我们计划从原始缓存主机中删除80GB内存。后续有更多更新。

更新4

通过将高速缓存主机中的RAM数量减少到16GB,似乎已经解决了该问题。随着流量增加到400倍,我们不再看到间歇性错误。似乎被关闭了。现在进入下一个问题:High Availability

最佳答案

您是否已安装http://support.microsoft.com/kb/983182http://support.microsoft.com/kb/2527387
在您的代码中,您是否检查异常和retrylater bool?

                catch (DataCacheException ex2)
{
if (ex2.ErrorCode == DataCacheErrorCode.RetryLater)
{


使用命名区域会强制服务器将该命名区域的值推送到单个服务器,而不是将散列散布到所有缓存服务器中。 (“为了提供此添加的搜索功能,区域中的对象仅限于单个缓存主机。” http://msdn.microsoft.com/en-us/library/ee790985(v=azure.10).aspx


我建议您将命名区域跨另外2台服务器分片,然后将它们放入群集中。这样,您可以将例外情况限制在运行GC的小型服务器上,并尝试查找更多的RAM来放置和存储对象和标签。

关于appfabric - 解决AppFabric缩放问题(间歇性ErrorCode <ERRCA0017>:SubStatus <ES0006>错误),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12273519/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com