I am getting 'occasional' 500 Transport errors on AWS managed ES stack when posting JSONS from our python based lambdas. This was working flawlessly until today's upgrade to ES 7.9.
We have tried increasing the cluster sizes as per the AWS best practices. The error goes away for a while and pops up again after couple of hours. We are sending about 220 POST request to the ES service every 15 mins.
Python stacktrace
[WARNING] 2020-12-04T15:27:27.558Z c4bad482-15ba-4a59-ac00-018a3dc893a4 POST https://my_aws_domain.es.amazonaws.com:443/<my_index>/_doc [status:500 request:0.224s]
[ERROR] TransportError: TransportError(500, 'exception', 'java.io.OptionalDataException')Traceback (most recent call last): File "/var/task/app.py", line 123, in lambda_handler raise error File "/var/task/app.py", line 115, in lambda_handler resp = elastic.push_to_elastic(ES_ENDPOINT, ES_INDEX, es_secrets, File "/opt/python/my_utils/elasticsearch_helper.py", line 30, in push_to_elastic resp = es_client.index(index=es_index, File "/opt/python/elasticsearch/client/utils.py", line 152, in _wrapped return func(*args, params=params, headers=headers, **kwargs) File "/opt/python/elasticsearch/client/__init__.py", line 391, in index return self.transport.perform_request( File "/opt/python/elasticsearch/transport.py", line 392, in perform_request raise e File "/opt/python/elasticsearch/transport.py", line 358, in perform_request status, headers_response, data = connection.perform_request( File "/opt/python/elasticsearch/connection/http_urllib3.py", line 269, in perform_request self._raise_error(response.status, raw_data) File "/opt/python/elasticsearch/connection/base.py", line 300, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)
Elastic search App log stack trace from AWS
[2020-12-04T13:26:16,650][WARN ][r.suppressed ] [6d8f8982081d9147943daa837a26de91] path: __PATH__ params: {index=<my_index>, op_type=create}org.elasticsearch.transport.RemoteTransportException: [957c78aec7f1d2570150317d406e8b46][__IP__][__PATH__[s]]Caused by: org.elasticsearch.ElasticsearchException: java.io.OptionalDataException__AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____ at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:263) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:255) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:176) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:93) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:76) [transport-netty4-client-7.9.1.jar:7.9.1] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final] at java.lang.Thread.run(Thread.java:834) [?:?]Caused by: java.io.IOException at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:?] at java.util.HashSet.readObject(HashSet.java:341) ~[?:?] at jdk.internal.reflect.GeneratedMethodAccessor78.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160) ~[?:?] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2271) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2142) ~[?:?] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) ~[?:?] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410) ~[?:?] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2304) ~[?:?] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2142) ~[?:?] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) ~[?:?] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:?]__AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL__ at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:263) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:255) [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:176) ~[elasticsearch-7.9.1.jar:7.9.1] at [elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[elasticsearch-7.9.1.jar:7.9.1] at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[elasticsearch-7.9.1.jar:7.9.1] at <truncating> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?] at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) ~[?:?] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) ~[?:?] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[?:?] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?]
This seems very similar to this issue, We get 50% success rate consistently when this error occurs. Lambda retires seems to take care of the failed requests, by retrying when anything fails & it goes through that time. So its isn't data related.
Could some one please explain what could be going on?