Elasticsearch Kibana ECONNRESET - SOLVED

Hi,

I am seeing this page and I am not sure what causes it. Please let me know if you have any idea what can be the reason?

Kibana is pointing to the same machine that is hosted on.

cat /etc/kibana/kibana.yml 
server.host: 10.50.30.150
elasticsearch.url: http://10.50.30.106:9200

elasticsearch.requestTimeout: 3600000
pid.file: "/var/run/kibana/kibana.pid"
logging.dest: /var/log/kibana/kibana.log
logging.quiet: true

xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.apm.ui.enabled: false
xpack.watcher.enabled: false
xpack.security.enabled: false
xpack.monitoring.enabled: true

xpack.reporting.queue.timeout: 3600000
xpack.reporting.encryptionKey: "123456789"
xpack.reporting.csv.maxSizeBytes: 90857600

Installed versions:

rpm -aq elasticsearch kibana xpack
elasticsearch-6.1.2-1.noarch
kibana-6.1.2-1.x86_64

Here is some info about the node that hosted both kibana and elasticsearch.

# netstat -anp | awk '/tcp/ {print $6 $7}' | sort | uniq -c
      1 ESTABLISHED-
      1 ESTABLISHED1059/elasticsearch_
     28 ESTABLISHED11533/oauth2_proxy
      1 ESTABLISHED19069/sshd:
    190 ESTABLISHED22603/java
      6 ESTABLISHED22674/node
      4 ESTABLISHED22966/nginx:
      4 ESTABLISHED22967/nginx:
      5 ESTABLISHED22968/nginx:
      4 ESTABLISHED22969/nginx:
      2 ESTABLISHED563/node_exporter
      2 LISTEN1017/master
      1 LISTEN1059/elasticsearch_
      2 LISTEN1081/sshd
      1 LISTEN11533/oauth2_proxy
      2 LISTEN1/systemd
      4 LISTEN22603/java
      1 LISTEN22674/node
      1 LISTEN22965/nginx:
      1 LISTEN563/node_exporter
      3 LISTEN910/unbound
     58 TIME_WAIT-

Kibana log:
tailf /var/log/kibana/kibana.log

{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["error","elasticsearch","admin"],"pid":9897,"message":"Request complete with error\nGET http://10.50.30.150:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:xpack_main@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:reporting@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:searchprofiler@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:tilemap@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:logstash@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-04-29T08:33:30Z","tags":["status","plugin:elasticsearch@6.1.2","error"],"pid":9897,"state":"red","message":"Status changed from red to red - Error: socket hang up","prevState":"red","prevMsg":{"code":"ECONNRESET"}}

Elasticsearch log:

tailf /var/log/elasticsearch/elasticsearch.log
[2018-04-29T06:26:56,271][INFO ][o.e.c.s.ClusterApplierService] [ip-10-50-30-150] detected_master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{--PfgRhQTj6GRF4JVG66wg}{10.50.40.180}{10.50.40.180:9300}, added {{ip-10-50-45-124}{76gR60gxSJCtiO69f5QITQ}{d6jF79r0TBa1yRIeQD8M8g}{10.50.45.124}{10.50.45.124:9300},{ip-10-50-40-233}{gWnBJuHxQT2J7C4OmrAgQQ}{K35r-ZkjQn-9iLWVgDsevA}{10.50.40.233}{10.50.40.233:9300},{ip-10-50-45-225}{hf9ZXod6SL27ZIFP5V0KCw}{BDiqgQA5SXSSOJAsdBjjaQ}{10.50.45.225}{10.50.45.225:9300},{ip-10-50-40-185}{1RpuRSi1QNmOf5X63ER6Sw}{Imexn5LDReyKrdSXHYy3_g}{10.50.40.185}{10.50.40.185:9300},{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{--PfgRhQTj6GRF4JVG66wg}{10.50.40.180}{10.50.40.180:9300},{ip-10-50-30-72}{Bix5ETL4S5KNyBFR2LgeKQ}{arNwhKlRT-erxk1RQVAKdQ}{10.50.30.72}{10.50.30.72:9300},{ip-10-50-30-106}{zIOpN_3XTxaVtps4sRsrag}{2DppOBqXTP2LbyEW6KsFkQ}{10.50.30.106}{10.50.30.106:9300},}, reason: apply cluster state (from master [master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{--PfgRhQTj6GRF4JVG66wg}{10.50.40.180}{10.50.40.180:9300} committed version [19818]])
[2018-04-29T06:26:56,537][DEBUG][o.e.a.b.TransportBulkAction] [ip-10-50-30-150] failed to execute pipeline [xpack_monitoring_6] for document [.monitoring-es-6-2018.04.29/doc/null]
java.lang.IllegalArgumentException: pipeline with id [xpack_monitoring_6] does not exist
	at org.elasticsearch.ingest.PipelineExecutionService.getPipeline(PipelineExecutionService.java:194) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.access$100(PipelineExecutionService.java:42) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) [?:?]
	at java.lang.Thread.run(Thread.java:844) [?:?]
[2018-04-29T06:26:57,038][INFO ][o.e.l.LicenseService     ] [ip-10-50-30-150] license [927ef48c-0c1a-491b-b514-f17bb3cdebf7] mode [basic] - valid
[2018-04-29T06:26:57,075][INFO ][o.e.h.n.Netty4HttpServerTransport] [ip-10-50-30-150] publish_address {10.50.30.150:9200}, bound_addresses {10.50.30.150:9200}, {[fe80::5c:6aff:fe2b:101e]:9200}
[2018-04-29T06:26:57,075][INFO ][o.e.n.Node               ] [ip-10-50-30-150] started

Let me know if you need any other info.
Thank you.

1 Like

Have you by any chance installed X-Pack in Kibana but not installed or set it up correctly in Elasticsearch?

Thank you for your reply @Christian_Dahlqvist.

X-Pack installed on both kibana and elasticsearch and it was working fine for months, but suddenly this red alert happend.

/usr/share/elasticsearch/bin/elasticsearch-plugin list
x-pack

/usr/share/kibana/bin/kibana-plugin list
x-pack@6.1.2

Please let me know what you suggest.

Anything else in the Elasticsearch logs since then?

tailf -n 100 /var/log/elasticsearch/dbus.log
2018-05-01T07:00:18,258][WARN ][r.suppressed             ] path: /.reporting-*/esqueue/_search, params: {index=.reporting-*, type=esqueue, version=true}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
	at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:165) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:151) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:286) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$4(TransportSearchAction.java:193) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:113) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:86) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:215) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:68) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:405) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:532) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.rest.action.search.RestSearchAction.lambda$prepareRequest$2(RestSearchAction.java:93) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:97) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-6.1.2.jar:6.1.2]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) [netty-transport-4.1.13.Final.jar:4.1.13.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.13.Final.jar:4.1.13.Final]
	at java.lang.Thread.run(Thread.java:844) [?:?]
[2018-05-01T07:00:22,321][INFO ][o.e.c.s.ClusterApplierService] [ip-10-50-40-185] detected_master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300}, added {{ip-10-50-45-124}{76gR60gxSJCtiO69f5QITQ}{Xl2-ad9zSDaJr6nqaA3wKw}{10.50.45.124}{10.50.45.124:9300},{ip-10-50-45-225}{hf9ZXod6SL27ZIFP5V0KCw}{BDiqgQA5SXSSOJAsdBjjaQ}{10.50.45.225}{10.50.45.225:9300},{ip-10-50-30-72}{Bix5ETL4S5KNyBFR2LgeKQ}{arNwhKlRT-erxk1RQVAKdQ}{10.50.30.72}{10.50.30.72:9300},{ip-10-50-30-106}{zIOpN_3XTxaVtps4sRsrag}{pU9DY062RQqaPvNeb-nyww}{10.50.30.106}{10.50.30.106:9300},{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300},{ip-10-50-30-150}{yIHDymPHQf66XVX0y99tiA}{gPl6rzBqS4aV6ijKpkFmxA}{10.50.30.150}{10.50.30.150:9300},{ip-10-50-40-233}{gWnBJuHxQT2J7C4OmrAgQQ}{K35r-ZkjQn-9iLWVgDsevA}{10.50.40.233}{10.50.40.233:9300},}, reason: apply cluster state (from master [master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300} committed version [20611]])
[2018-05-01T07:00:22,589][INFO ][o.e.l.LicenseService     ] [ip-10-50-40-185] license [927ef48c-0c1a-491b-b514-f17bb3cdebf7] mode [basic] - valid
[2018-05-01T07:00:37,819][INFO ][o.e.d.z.ZenDiscovery     ] [ip-10-50-40-185] failed to send join request to master [{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]
[2018-05-01T07:01:22,377][WARN ][o.e.t.TransportService   ] [ip-10-50-40-185] Received response for a request that has timed out, sent [59058ms] ago, timed out [29058ms] ago, action [internal:discovery/zen/fd/master_ping], node [{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300}], id [79]
grep fail /var/log/elasticsearch/dbus.log | tail -n 40
[2018-05-01T06:59:22,619][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,630][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,638][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,643][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,646][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,651][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,653][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,655][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,662][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,670][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,673][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,677][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,680][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,683][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,685][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,694][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,694][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,697][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,699][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,700][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,703][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,707][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,708][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,711][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,714][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,723][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,729][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,752][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,755][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,757][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,759][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,760][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,765][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,768][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,780][DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,783][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [ip-10-50-40-185] failed to execute on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T06:59:22,786][DEBUG][o.e.a.a.i.s.TransportIndicesStatsAction] [ip-10-50-40-185] failed to execute [indices:monitor/stats] on node [Bix5ETL4S5KNyBFR2LgeKQ]
[2018-05-01T07:00:37,819][INFO ][o.e.d.z.ZenDiscovery     ] [ip-10-50-40-185] failed to send join request to master [{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{54ZpoqUPSCGe1yKxYveXQw}{10.50.40.180}{10.50.40.180:9300}], reason [ElasticsearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.]; ]

It's much easier if you create just one thread with all the information in it :slight_smile:

Thanks @warkolm.
I wasn't sure if this issue is related to the kibana of ES! Sorry.

Dear @ikakavas and @Christian_Dahlqvist ,

So I explain my issue agian since I could not solve it yet. I really need help for it.

Kibana deployed on the same node as ES is on and I can curl the 10.50.30.150:9200 as you can see following:

curl -XGET 10.50.30.150:9200/_cluster/health?pretty
{
  "cluster_name" : "dbus",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 1101,
  "active_shards" : 2203,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 3,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 4341,
  "active_shards_percent_as_number" : 100.0
}

But in the kibana I am seeing this error.


After a while it truns to this.

And stuck here.

Traffic goes to the kibana like this.
Client -> nginx -> oauth2-proxy -> kibana -> elasticsearch

Kibana logs are as following:

{"type":"log","@timestamp":"2018-05-02T06:22:16Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:22:46Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:23:16Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request complete with error\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:xpack_main@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:reporting@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:searchprofiler@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:tilemap@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:logstash@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:23:46Z","tags":["status","plugin:elasticsearch@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from red to red - Error: socket hang up","prevState":"red","prevMsg":{"code":"ECONNRESET"}}
{"type":"log","@timestamp":"2018-05-02T06:24:18Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:24:48Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:25:18Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request error, retrying\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["error","elasticsearch","admin"],"pid":20963,"message":"Request complete with error\nGET http://10.50.30.106:9200/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip => socket hang up"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:xpack_main@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:reporting@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:searchprofiler@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:tilemap@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:logstash@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from green to red - socket hang up","prevState":"green","prevMsg":"Ready"}
{"type":"log","@timestamp":"2018-05-02T06:25:48Z","tags":["status","plugin:elasticsearch@6.1.2","error"],"pid":20963,"state":"red","message":"Status changed from red to red - Error: socket hang up","prevState":"red","prevMsg":{"code":"ECONNRESET"}}

ES logs of the node that kibana is pointing to.

	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:391) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeFailure(TransportBroadcastByNodeAction.java:376) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleException(TransportBroadcastByNodeAction.java:335) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1056) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.transport.TransportService.lambda$onConnectionClosed$11(TransportService.java:884) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) [?:?]
	at java.lang.Thread.run(Thread.java:844) [?:?]
[2018-05-02T06:11:21,498][INFO ][o.e.n.Node               ] [ip-10-50-30-150] closed
[2018-05-02T06:11:25,003][INFO ][o.e.n.Node               ] [ip-10-50-30-150] initializing ...
[2018-05-02T06:11:25,084][INFO ][o.e.e.NodeEnvironment    ] [ip-10-50-30-150] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [2.8gb], net total_space [7.9gb], types [rootfs]
[2018-05-02T06:11:25,084][INFO ][o.e.e.NodeEnvironment    ] [ip-10-50-30-150] heap size [7.9gb], compressed ordinary object pointers [true]
[2018-05-02T06:11:25,098][INFO ][o.e.n.Node               ] [ip-10-50-30-150] node name [ip-10-50-30-150], node ID [yIHDymPHQf66XVX0y99tiA]
[2018-05-02T06:11:25,099][INFO ][o.e.n.Node               ] [ip-10-50-30-150] version[6.1.2], pid[20881], build[5b1fea5/2018-01-10T02:35:59.208Z], OS[Linux/3.10.0-693.17.1.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/9.0.4/9.0.4+11]
[2018-05-02T06:11:25,099][INFO ][o.e.n.Node               ] [ip-10-50-30-150] JVM arguments [-Xms8g, -Xmx8g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2018-05-02T06:11:27,452][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [aggs-matrix-stats]
[2018-05-02T06:11:27,452][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [analysis-common]
[2018-05-02T06:11:27,452][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [ingest-common]
[2018-05-02T06:11:27,452][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [lang-expression]
[2018-05-02T06:11:27,452][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [lang-mustache]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [lang-painless]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [mapper-extras]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [parent-join]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [percolator]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [reindex]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [repository-url]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [transport-netty4]
[2018-05-02T06:11:27,453][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded module [tribe]
[2018-05-02T06:11:27,454][INFO ][o.e.p.PluginsService     ] [ip-10-50-30-150] loaded plugin [x-pack]
[2018-05-02T06:11:30,050][INFO ][o.e.d.DiscoveryModule    ] [ip-10-50-30-150] using discovery type [zen]
[2018-05-02T06:11:30,704][INFO ][o.e.n.Node               ] [ip-10-50-30-150] initialized
[2018-05-02T06:11:30,705][INFO ][o.e.n.Node               ] [ip-10-50-30-150] starting ...
[2018-05-02T06:11:30,849][INFO ][o.e.t.TransportService   ] [ip-10-50-30-150] publish_address {10.50.30.150:9300}, bound_addresses {10.50.30.150:9300}, {[fe80::5c:6aff:fe2b:101e]:9300}
[2018-05-02T06:11:30,860][INFO ][o.e.b.BootstrapChecks    ] [ip-10-50-30-150] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-05-02T06:11:44,898][INFO ][o.e.c.s.ClusterApplierService] [ip-10-50-30-150] detected_master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{LpUcPZ0XQ3Sxs2MVJA1R0w}{10.50.40.180}{10.50.40.180:9300}, added {{ip-10-50-30-106}{zIOpN_3XTxaVtps4sRsrag}{R181RMHHS5miQj3mOt_NOw}{10.50.30.106}{10.50.30.106:9300},{ip-10-50-45-124}{76gR60gxSJCtiO69f5QITQ}{JcL0vAeIQGuGfkRJFuynjQ}{10.50.45.124}{10.50.45.124:9300},{ip-10-50-45-225}{hf9ZXod6SL27ZIFP5V0KCw}{BDiqgQA5SXSSOJAsdBjjaQ}{10.50.45.225}{10.50.45.225:9300},{ip-10-50-40-233}{gWnBJuHxQT2J7C4OmrAgQQ}{K35r-ZkjQn-9iLWVgDsevA}{10.50.40.233}{10.50.40.233:9300},{ip-10-50-30-72}{Bix5ETL4S5KNyBFR2LgeKQ}{arNwhKlRT-erxk1RQVAKdQ}{10.50.30.72}{10.50.30.72:9300},{ip-10-50-40-185}{1RpuRSi1QNmOf5X63ER6Sw}{vlFguEIWTRuDNIoCjmnIvA}{10.50.40.185}{10.50.40.185:9300},{ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{LpUcPZ0XQ3Sxs2MVJA1R0w}{10.50.40.180}{10.50.40.180:9300},}, reason: apply cluster state (from master [master {ip-10-50-40-180}{yYSAp14mTxmRqkoIhQWDcA}{LpUcPZ0XQ3Sxs2MVJA1R0w}{10.50.40.180}{10.50.40.180:9300} committed version [20869]])
[2018-05-02T06:11:45,209][INFO ][o.e.l.LicenseService     ] [ip-10-50-30-150] license [927ef48c-0c1a-491b-b514-f17bb3cdebf7] mode [basic] - valid
[2018-05-02T06:11:45,249][INFO ][o.e.h.n.Netty4HttpServerTransport] [ip-10-50-30-150] publish_address {10.50.30.150:9200}, bound_addresses {10.50.30.150:9200}, {[fe80::5c:6aff:fe2b:101e]:9200}
[2018-05-02T06:11:45,250][INFO ][o.e.n.Node               ] [ip-10-50-30-150] started

It looks like you are using Elasticsearch 6.1.2 with Java9. According to the support matrix, Java9 is not (at least officially) supported until version 6.2. I wonder if that could have anything to do with it?

Thank you for replying.
I am not sure if that would be the case, since it was working fine with this config for months and we haven't made any change in it.
it suddenly breaked like this.

BTW, when I restarted elastic search and kibana it worked fine for some minutes and after that no.

Please let me know if you need any other information.

@timroes @ikakavas Any thought on this issue? Thanks

Also it does this.
curl -XGET 10.50.30.150:9200/_cat/master?v
or this
curl -XGET 10.50.30.150:9200/_cluster/health?v

But it take ages for this query.
curl -XGET 10.50.30.150:9200/_cat/shards?v

Please read this and specifically the "Also be patient" part. It's been less than 24hr since your last post.

You have started a number of topics and you have sent private messages to a number of people already including a lot of information, different errors and scattered logs here and there. It would be much easier for anyone of us to help you out if you focus on one topic and on one issue at a time.

2 Likes

PROBLEM SOLVED!

Kibana conducts a health check every few seconds by querying the nodes API (/nodes) of Elasticsearch. If this does not respond quickly enough, Kibana will go red.

It has been discovered that this is a very costly API call to make (Kibana causes off heap memory problems on elasticsearch masters. · Issue #16733 · elastic/kibana · GitHub) and the health check is in the process of being removed by the Kibana team (Remove the Health Check · Issue #14163 · elastic/kibana · GitHub).

Until then, we've taken a few counter-measures:

  1. Increased the time between health checks to 1 hour,
  2. Increased the size of the servers in the cluster(especially data node), and

[2018-05-04T08:49:12,857][WARN ][o.e.m.j.JvmGcMonitorService] [ip-10-50-40-233] [gc][4448917] overhead, spent [899ms] collecting in the last [1s]
[2018-05-04T08:49:13,930][WARN ][o.e.m.j.JvmGcMonitorService] [ip-10-50-40-233] [gc][4448918] overhead, spent [921ms] collecting in the last [1s]

  1. Introduced HAProxy to rewrite /nodes to /nodes/_local, a local endpoint that doesn't go to the cluster masters.

HAproxy config:

frontend elastic-in
bind :9201
default_backend elastic-out

backend elastic-out
http-request set-path /_nodes/_local if { path_beg /_nodes }
server localhost 10.50.30.150:9200
3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.