Metricbeat output stops appearing in Kibana after a few minutes

I've had Metricbeat -> Elasticsearch -> Kibana working fine on a number of deployments for some time. I'm now building a complete new stack, and what I'm seeing is that Metricbeat output looks normal until Metricbeat has been running for a few minutes, after which no further Metricbeat output appears in Kibana.

The Metricbeat logs appear to continue normally, with no obvious change between before and after the data stops appearing in Kibana. There is nothing obviously relevant in the Elasticsearch logs. Other indices appear correct in both Elasticsearch and Kibana.

There is stuff in the Elasticsearch logs which I don't understand, being sequences like this

[2018-04-24T13:44:02,392][INFO ][o.e.c.s.ClusterService ] [live-monitor-1] removed {{live-monitor-3}{W0B2alNyQOqXerA94r-1PA}{O3IHPNZ2SoO-X6LRMPYDjw}{172.31.13.99}{172.31.13.99:9300},}, reason: zen-disco-receive(from master [master {live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300} committed version [21444]])
[2018-04-24T13:44:05,973][INFO ][o.e.c.s.ClusterService ] [live-monitor-1] added {{live-monitor-3}{W0B2alNyQOqXerA94r-1PA}{O3IHPNZ2SoO-X6LRMPYDjw}{172.31.13.99}{172.31.13.99:9300},}, reason: zen-disco-receive(from master [master {live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300} committed version [21446]])
[2018-04-24T13:48:34,838][INFO ][o.e.d.z.ZenDiscovery ] [live-monitor-1] master_left [{live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300}], reason [transport disconnected]
[2018-04-24T13:48:34,838][WARN ][o.e.d.z.ZenDiscovery ] [live-monitor-1] master left (reason = transport disconnected), current nodes: nodes:
{live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300}, master
{live-monitor-1}{xE9eAhNXQ0uBQAaPlqfuFQ}{6D5AWlLJQuWHcJHhj727Iw}{172.31.11.96}{172.31.11.96:9300}, local
{live-monitor-3}{W0B2alNyQOqXerA94r-1PA}{O3IHPNZ2SoO-X6LRMPYDjw}{172.31.13.99}{172.31.13.99:9300}

[2018-04-24T13:48:37,849][INFO ][o.e.c.s.ClusterService ] [live-monitor-1] detected_master {live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300}, reason: zen-disco-receive(from master [master {live-monitor-2}{49PcsRhMSUitVmH6PHbTCA}{_bdlSndNSrSL7g2AFcXbhg}{172.31.12.89}{172.31.12.89:9300} committed version [21474]])

but I've no idea whether these are relevant or whether they matter.

If I restart Metricbeat its output appears in Kibana again, for a few minutes, until it stops again. This applies to all the hosts on which I'm running Metricbeat in this system deployment.

How can I diagnose and fix what is going on?

OK, so the documents are coming through ... very very slowly. I've creating an "ingested" timestamp on each document as well as the @timestamp, and there's around 25 minutes between them. Where are 25 minutes' worth of Metricbeat input getting queued up, and why? None of the machines involved show any significant CPU usage.

I suggest you run metricbeat with publish debug logs enabled: -d publish.

This will help see when are the events published and if there is any error delivering them to Elasticsearch.

Will do, if the problem recurs (I've finished that test and deleted the bunch of VMs, but will be building a new set and doing it again shortly).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.