Always timeout for ping and cluster node stats


#1

In es 2.3.5 the cluster has five nodes.Just start them and always pops the timeouts in logs, I dont know why.Firts is not the GC.GC is very low.I think maybe the problem is shield and marvel? So I tick this topic and want to get help.

indent preformatted text by 4 spaces

[2016-09-22 13:24:33,478][DEBUG][action.admin.cluster.node.stats] [q1den2nres101] failed to execute on node [NMpHYOMcQ7OUruj2p2SLpw]

[2016-09-22 12:53:39,850][WARN ][shield.transport ] [q1den2nres101] Received response for a request that has timed out, sent [44211ms] ago, timed out [14210ms] ago, action [internal:discovery/zen/fd/ping], node [{q1den2nres105}{4BczsyGLSmeaXfTosAsX1w}{10.68.52.105}{10.68.52.105:9300}], id [367854]

[2016-09-22 14:09:35,344][DEBUG][action.search ] [q1den2nres101] [3656] Failed to execute query phase
RemoteTransportException[[q1den2nres104][10.68.52.104:9300][indices:data/read/search[phase/query+fetch/scroll]]]; nested: SearchContextMissingException[No search context found for id [3656]];
Caused by: SearchContextMissingException[No search context found for id [3656]]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:613)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:542)

[2016-09-22 14:09:35,347][ERROR][shield.authc.esnative ] [q1den2nres101] error occurred while checking the native users for changes
Failed to execute phase [query_fetch], all shards failed; shardFailures {RemoteTransportException[[q1den2nres104][10.68.52.104:9300][indices:data/read/search[phase/query+fetch/scroll]]]; nested: SearchContextMissingException[No search context found for id [3656]]; }
at org.elasticsearch.action.search.SearchScrollQueryAndFetchAsyncAction.onPhaseFailure(SearchScrollQueryAndFetchAsyncAction.java:155)
Caused by: RemoteTransportException[[q1den2nres104][10.68.52.104:9300][indices:data/read/search[phase/query+fetch/scroll]]]; nested: SearchContextMissingException[No search context found for id [3656]];
Caused by: SearchContextMissingException[No search context found for id [3656]]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:613)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:542)

[2016-09-22 18:04:56,011][WARN ][shield.transport ] [q1den2nres102] Received response for a request that has timed out, sent [50024ms] ago, timed out [20023ms] ago, action [internal:discovery/zen/fd/master_ping], node [{q1den2nres101}{0hm2kRvqSDexAkQn2orLNg}{10.68.52.101}{10.68.52.101:9300}], id [163984]

[2016-09-23 11:30:38,851][TRACE][discovery.zen.fd ] [q1den2nres104] [master] failed to ping [{q1den2nres103}{5kVVQUnTSuSaXQFZTboveg}{10.68.52.103}{10.68.52.103:9300}], retry [1] out of [3]
ReceiveTimeoutTransportException[[q1den2nres103][10.68.52.103:9300][internal:discovery/zen/fd/master_ping] request_id [37155] timed out after [30000ms]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:679)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-09-23 11:31:08,852][TRACE][discovery.zen.fd ] [q1den2nres104] [master] failed to ping [{q1den2nres103}{5kVVQUnTSuSaXQFZTboveg}{10.68.52.103}{10.68.52.103:9300}], retry [2] out of [3]
ReceiveTimeoutTransportException[[q1den2nres103][10.68.52.103:9300][internal:discovery/zen/fd/master_ping] request_id [37185] timed out after [30000ms]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:679)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2016-09-23 11:31:33,040][WARN ][shield.transport ] [q1den2nres104] Received response for a request that has timed out, sent [84189ms] ago, timed out [54189ms] ago, action [internal:discovery/zen/fd/master_ping], node [{q1den2nres103}{5kVVQUnTSuSaXQFZTboveg}{10.68.52.103}{10.68.52.103:9300}], id [37155]

indent preformatted text by 4 spaces

(Mark Walkom) #2

Are all your nodes in the same datacentre?


#3

yes,they are. And the problem is resolved. Just because many connections on 9300 port are disconnecting the Es cluster.
Thanks


(Thomas Decaux) #4

Hello,

I see this problem too, when we are running big queries (> 20 secondes).

Is there something to do with the machine network settings?

Thanks you,


#5

No,but you'd better stop all sniffs from all clients.


#6

I'm getting the same error. Is this something i need to fix the elasticsearch.yaml or filebeat.yaml?


(system) #7