Exceptions in the log in ElasticSearch to 5.6.9


(shashank) #1

I have a single node cluster deployment on Elasticsearch 5.6.9..

During and After the ES startup, i continuously see a lot of these messages in the logs.

Any help how to debug this further ?

2018-06-26 08:23:39,623 [orker.default]][T#2]] [ ] [ ] [ ] (ansport.netty4.Netty4Transport) WARN - exception caught on transport layer [[id: 0x7e1b44e8, L:0.0.0.0/0.0.0.0:9300]], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (16,3,3,0)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) ~[netty-codec-4.1.13.Final.jar:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) ~[netty-codec-4.1.13.Final.jar:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) ~[netty-codec-4.1.13.Final.jar:?]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) ~[netty-codec-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:167) ~[netty-handler-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.13.Final.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[netty-common-4.1.13.Final.jar:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[netty-transport-4.1.13.Final.jar:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[netty-common-4.1.13.Final.jar:?]
at java.lang.Thread.run(Thread.java:811) [?:2.9 (11-22-2017)]
Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (16,3,3,0)
at org.elasticsearch.transport.TcpTransport.validateMessageHeader(TcpTransport.java:1293) ~[elasticsearch-5.6.9.jar:?]
at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:36) ~[transport-netty4-client-5.6.9.jar:?]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.13.Final.jar:?]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[netty-codec-4.1.13.Final.jar:?]
... 24 more


(David Pilato) #2

My guess is that you have an older version running somewhere which tries to connect to this cluster or you are using an old Java TransportClient.


(shashank) #3

Hi David,

i am quite sure that neither the older version not transport client is an issue.

Also, i see the below exception wherein ES is not able to assign shards for on the indices. Is there any limits that are suggested by ES wrt virtual address memory space usage in 5.x ? I am able to get this working when we were on 2.4.6 version of ES

2018-06-26 09:07:55,073 [ice#updateTask][T#1]] [ ] [ ] [ ] (cluster.service.ClusterService) TRACE - cluster state updated, source [cluster_update_settings]
cluster uuid: dOs6CYDwSTynjVWroC6xrA
version: 80
state uuid: _x4NcHFhROWKdFgYDUvXag
from_diff: false
meta data version: 76
[help/aJrEXiVsSEqEb1SJ29fniw]: v[11]
0: p_term [5], isa_ids [FTDXEc2PTgeTkrQsd8LSrQ]
[data/f4XPISkZSi-5ZR6EP_E3Xg]: v[92]
0: p_term [5], isa_ids [AiXjk7wRS5y9jL23oveavA]
[rule/JQaBovrPShqvLUjN7Riodw]: v[561]
0: p_term [20], isa_ids [7wsrtbE8TraUi7ojm45KtQ]
[work/T1C0eAMfSq2jutByJaH2uw]: v[11]
0: p_term [5], isa_ids [5n-wXMvQTHWe06vwM-MHhg]
nodes:
{4A55AF6DE60789BD3686EBD870014F78}{2erYQJHXQHe4AUIccdNacw}{JToVWFIEQvqU4JA8IWrrmQ}{10.1.99.151}{10.1.99.151:9300}, local, master
routing_table (version 21):
-- index [[help/aJrEXiVsSEqEb1SJ29fniw]]
----shard_id [help][0]
--------[help][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=FTDXEc2PTgeTkrQsd8LSrQ]

-- index [[data/f4XPISkZSi-5ZR6EP_E3Xg]]
----shard_id [data][0]
--------[data][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=AiXjk7wRS5y9jL23oveavA]

-- index [[work/T1C0eAMfSq2jutByJaH2uw]]
----shard_id [work][0]
--------[work][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=5n-wXMvQTHWe06vwM-MHhg]

-- index [[rule/JQaBovrPShqvLUjN7Riodw]]
----shard_id [rule][0]
--------[rule][0], node[null], [P], recovery_source[existing recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-06-26T07:26:53.165Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[rule][0]: Recovery failed on {4A55AF6DE60789BD3686EBD870014F78}{2erYQJHXQHe4AUIccdNacw}{JToVWFIEQvqU4JA8IWrrmQ}{10.1.99.151}{10.1.99.151:9300}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: IOException[EDC5124I Too many open files. (errno2=0x07360344): MMapIndexInput(path="/lbcell3/WASL023/temp/SearchIndex/nodes/0/indices/JQaBovrPShqvLUjN7Riodw/0/index/_1b.cfs") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 68601012 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'). More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]]; ], allocation_status[deciders_no]]

routing_nodes:
-----node_id[2erYQJHXQHe4AUIccdNacw][V]
--------[help][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=FTDXEc2PTgeTkrQsd8LSrQ]
--------[data][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=AiXjk7wRS5y9jL23oveavA]
--------[work][0], node[2erYQJHXQHe4AUIccdNacw], [P], s[STARTED], a[id=5n-wXMvQTHWe06vwM-MHhg]
---- unassigned
--------[rule][0], node[null], [P], recovery_source[existing recovery], s[UNASSIGNED], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-06-26T07:26:53.165Z], failed_attempts[5], delayed=false, details[failed recovery, failure RecoveryFailedException[[rule][0]: Recovery failed on {4A55AF6DE60789BD3686EBD870014F78}{2erYQJHXQHe4AUIccdNacw}{JToVWFIEQvqU4JA8IWrrmQ}{10.1.99.151}{10.1.99.151:9300}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: IOException[EDC5124I Too many open files. (errno2=0x07360344): MMapIndexInput(path="/lbcell3/WASL023/temp/SearchIndex/nodes/0/indices/JQaBovrPShqvLUjN7Riodw/0/index/_1b.cfs") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 68601012 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'). More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]]; ], allocation_status[deciders_no]]


(David Pilato) #4

This:

2018-06-26 08:23:39,623 [orker.default]][T#2]] [ ] [ ] [ ] (ansport.netty4.Netty4Transport) WARN - exception caught on transport layer [[id: 0x7e1b44e8, L:0.0.0.0/0.0.0.0:9300]], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (16,3,3,0)

Is totally unrelated IMO to:

{10.1.99.151}{10.1.99.151:9300}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to open reader on writer]; nested: IOException[EDC5124I Too many open files. (errno2=0x07360344): MMapIndexInput(path="/lbcell3/WASL023/temp/SearchIndex/nodes/0/indices/JQaBovrPShqvLUjN7Riodw/0/index/_1b.cfs") [this may be caused by lack of enough unfragmented virtual address space or too restrictive virtual memory limits enforced by the operating system, preventing us to map a chunk of 68601012 bytes. Please review 'ulimit -v', 'ulimit -m' (both should return 'unlimited'). More information: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]]; ], allocation_status[deciders_no]]

Here you have most likely a node with too many shards on it. That's just a guess.
What is the output of:

GET _cat/health?v
GET _cat/indices?v
GET _cat/nodes?v

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.