Getting TransportError: TransportError(500, 'exception', 'java.io.OptionalDataException' while posting to AWS managed elastic stack

I am getting 'occasional' 500 Transport errors on AWS managed ES stack when posting JSONS from our python based lambdas. This was working flawlessly until today's upgrade to ES 7.9.

We have tried increasing the cluster sizes as per the AWS best practices. The error goes away for a while and pops up again after couple of hours. We are sending about 220 POST request to the ES service every 15 mins.

Python stacktrace

[WARNING]	2020-12-04T15:27:27.558Z	c4bad482-15ba-4a59-ac00-018a3dc893a4	POST https://my_aws_domain.es.amazonaws.com:443/<my_index>/_doc [status:500 request:0.224s]
[ERROR] TransportError: TransportError(500, 'exception', 'java.io.OptionalDataException')Traceback (most recent call last):  File "/var/task/app.py", line 123, in lambda_handler    raise error  File "/var/task/app.py", line 115, in lambda_handler    resp = elastic.push_to_elastic(ES_ENDPOINT, ES_INDEX, es_secrets,  File "/opt/python/my_utils/elasticsearch_helper.py", line 30, in push_to_elastic    resp = es_client.index(index=es_index,  File "/opt/python/elasticsearch/client/utils.py", line 152, in _wrapped    return func(*args, params=params, headers=headers, **kwargs)  File "/opt/python/elasticsearch/client/__init__.py", line 391, in index    return self.transport.perform_request(  File "/opt/python/elasticsearch/transport.py", line 392, in perform_request    raise e  File "/opt/python/elasticsearch/transport.py", line 358, in perform_request    status, headers_response, data = connection.perform_request(  File "/opt/python/elasticsearch/connection/http_urllib3.py", line 269, in perform_request    self._raise_error(response.status, raw_data)  File "/opt/python/elasticsearch/connection/base.py", line 300, in _raise_error    raise HTTP_EXCEPTIONS.get(status_code, TransportError)

Elastic search App log stack trace from AWS

[2020-12-04T13:26:16,650][WARN ][r.suppressed             ] [6d8f8982081d9147943daa837a26de91] path: __PATH__ params: {index=<my_index>, op_type=create}org.elasticsearch.transport.RemoteTransportException: [957c78aec7f1d2570150317d406e8b46][__IP__][__PATH__[s]]Caused by: org.elasticsearch.ElasticsearchException: java.io.OptionalDataException__AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:263) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:255) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:176) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:93) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:76) [transport-netty4-client-7.9.1.jar:7.9.1]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]	at java.lang.Thread.run(Thread.java:834) [?:?]Caused by: java.io.IOException	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:?]	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) ~[?:?]	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:?]	at java.util.HashSet.readObject(HashSet.java:341) ~[?:?]	at jdk.internal.reflect.GeneratedMethodAccessor78.invoke(Unknown Source) ~[?:?]	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1160) ~[?:?]	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2271) ~[?:?]	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2142) ~[?:?]	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) ~[?:?]	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410) ~[?:?]	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2304) ~[?:?]	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2142) ~[?:?]	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) ~[?:?]	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:464) ~[?:?]	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) ~[?:?]__AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL____AMAZON_INTERNAL__	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:263) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:255) [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:176) ~[elasticsearch-7.9.1.jar:7.9.1]	at [elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[elasticsearch-7.9.1.jar:7.9.1]	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[elasticsearch-7.9.1.jar:7.9.1]	at <truncating> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) ~[?:?]	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) ~[?:?]	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) ~[?:?]	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[?:?]	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[?:?]	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[?:?]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]	at java.lang.Thread.run(Thread.java:834) ~[?:?]

This seems very similar to this issue, We get 50% success rate consistently when this error occurs. Lambda retires seems to take care of the failed requests, by retrying when anything fails & it goes through that time. So its isn't data related.

Could some one please explain what could be going on?

If this is the aws ES service you may need to ask them directly, they run a fork that we aren't able to help with sorry to say.

Did you look at Cloud by Elastic, also available if needed from AWS Marketplace ?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, Maps UI, Alerting and built-in solutions named Observability, Security, Enterprise Search and what is coming next :slight_smile: ...

1 Like

AWS support got back to us. they said its an issue with 7.9 and asked to downgrade to 7.7 by taking a manual snapshot.
So much for being a managed service.
I will update here if that solves the issue

You can't downgrade.

1 Like

Aws support said its possible by :
https://aws.amazon.com/elasticsearch-service/faqs/

" Q: Can I downgrade to previous version if I’m not comfortable with the new version?

No. If you need to downgrade to an older version, you must take a snapshot of your upgraded domain and restore it to a domain that uses the older Elasticsearch version."

Update : I am not just quoting docs, their support person told us that its possible.

A snapshot created in a newer version can not be read by older versions, so that sounds wrong.

3 Likes

Thanks for the heads-up!!

What we ended up doing was extracting all the _ids using the scroll API for all objects from old cluster AES 7.9. (elasticsearch_dsl python library)

Then extract the actual '_source' json using an mget query on the old cluster using those ids and used that json to push to our new cluster which has AES 7.7. ( using elasticsearch python library)

I am sure there is an easier way to do this, but this what we could script very quickly.

Reindex from remote does that for you.

1 Like

Thanks for the idea.
I was reading its documentation. Is it possible to perform reindex from remote with source as a 7.9 cluster and destination as a cluster v7.7?

Yes.

@dadoonet thank you for idea on remote from reindex. It was blazingly fast, given Elastic was designed to handle billions on documents.

Pro tip to anyone else performing remote reindex temporarily set replicas to zero for the destination, to speed up things.

Possibly related issue: https://discuss.elastic.co/t/temp-downtimes-optionaldataexception-500-on-all-apis-aws/257571/10:

[AWS support] they have deployed a release candidate on my cluster.