Restarting Filebeat on AKS Cluster

Stephani_Ferreira · May 2, 2025, 12:53pm

Hello Elastic Forum!

I've been having issues with Filebeat on our AKS clusters recently, it constantly restarts for no apparent reason. Two of our clusters are being affected by this, one for development and one for production.

I started an investigation to try and find the cause, but so far the only evidence I have found is the logs below:

{
    "log.level": "info",
    "log.logger": "input",
    "message": "input outlet closed",
    "service.name": "filebeat",
}
{
    "log.level": "error",
    "log.logger": "input",
    "message": "Problem putting initial state: input outlet closed",
    "service.name": "filebeat",
}
{
    "log.level": "error",
    "log.logger": "autodiscover.cfgfile",
    "message": "Error creating runner from config: input outlet closed",
    "service.name": "filebeat",
}

Despite Filebeat's restart episodes, it doesn't remain broken, after a few minutes it becomes healthy again in the cluster. We also took the following evidence regarding Kubernetes probes:

Readiness probe failed: command timed out: "sh -c #!/usr/bin/env bash -e\nfilebeat test output\n" timed out after 5s

Liveness probe failed: command timed out: "sh -c #!/usr/bin/env bash -e\ncurl --fail 127.0.0.1:5066\n" timed out after 5s

In our infrastructure we use the following pattern:

Filebeat + Logstash within the AKS cluster collecting and processing application logs
Elastic Cloud in the cloud, where it stores the data

Our versions:

[Development cluster]

Azure Kubernetes Service (AKS) in the v1.31.2
Filebeat and Logstash in the v8.17.4
Elastic Cloud in the v8.17.4

[Production cluster]

Azure Kubernetes Service (AKS) in the v1.31.2
Filebeat and Logstash in the v8.17.2
Elastic Cloud in the v8.17.2

Has anyone ever had a similar error or knows of any way to collect more information?

Stephani_Ferreira · May 2, 2025, 1:18pm

I also found these logs in the Logstash of the production environment.

Using bundled JDK: /usr/share/logstash/jdk
Sending Logstash logs to /usr/share/logstash/logs which is now configured via log4j2.properties
[2025-03-26T21:22:06,937][ERROR][logstash.outputs.elasticsearch][main][1d0...197] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://[elastic-cloud-domain].azure.elastic-cloud.com:9243/_bulk?filter_path=errors,items.*.error,items.*.status][Manticore::ClientProtocolException] [elastic-cloud-domain].azure.elastic-cloud.com:9243 failed to respond", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2025-03-26T21:29:24,137][ERROR][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash][e4b...197] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://[elastic-cloud-domain].azure.elastic-cloud.com:9243/_monitoring/bulk?system_id=logstash&system_api_version=7&interval=1s&filter_path=errors,items.*.error,items.*.status][Manticore::ClientProtocolException] [elastic-cloud-domain].azure.elastic-cloud.com:9243 failed to respond", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2025-04-03T00:06:13,442][ERROR][com.microsoft.azure.eventprocessorhost.PumpManager][main][06a...f1e] host logstash-e1831fb0-f50f-4086-bd26-86e76390434c: 0: throwing away zombie pump
[2025-04-03T00:07:21,162][ERROR][logstash.inputs.azure.processor][main][06a...f1e] Event Hub: evh-application-logs, Partition: 0 experienced an error com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'nil' with higher epoch of '0' is created hence current receiver 'nil' with epoch '0' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:f11a027a00015b490008e02867edd0f7_G12_B44, SystemTracker:[event-hubs-name]:eventhub:evh-application-logs~3276|evh-application-logs, Timestamp:2025-04-03T00:07:21, errorContext[NS: [event-hubs-name].servicebus.windows.net, PATH: evh-application-logs/ConsumerGroups/evh-application-logs/Partitions/0, REFERENCE_ID: LN_81a1c4_1743638775094_7ab_G12, PREFETCH_COUNT: 300, LINK_CREDIT: 297, PREFETCH_Q_LEN: 0])
[2025-04-15T13:58:59,891][ERROR][logstash.outputs.elasticsearch][main][1d0...197] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://[elastic-cloud-domain].azure.elastic-cloud.com:9243/_bulk?filter_path=errors,items.*.error,items.*.status][Manticore::ClientProtocolException] [elastic-cloud-domain].azure.elastic-cloud.com:9243 failed to respond", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2025-04-15T13:59:01,896][ERROR][logstash.outputs.elasticsearch][main][1d0...197] Attempted to send a bulk request but there are no living connections in the pool (perhaps Elasticsearch is unreachable or down?) {:message=>"No Available connections", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError, :will_retry_in_seconds=>4}
[2025-04-25T00:59:12,634][ERROR][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash][e4b...197] Attempted to send a bulk request but Elasticsearch appears to be unreachable or down {:message=>"Elasticsearch Unreachable: [https://[elastic-cloud-domain].azure.elastic-cloud.com:9243/_monitoring/bulk?system_id=logstash&system_api_version=7&interval=1s&filter_path=errors,items.*.error,items.*.status][Manticore::SocketTimeout] Read timed out", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :will_retry_in_seconds=>2}
[2025-04-25T00:59:14,637][ERROR][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash][e4b...197] Attempted to send a bulk request but there are no living connections in the pool (perhaps Elasticsearch is unreachable or down?) {:message=>"No Available connections", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError, :will_retry_in_seconds=>4}

Stephani_Ferreira · May 5, 2025, 5:19pm

I applied the following option within the filebeat.yml of my Filebeat.

filebeat.shutdown_timeout: 60s

Doc: Configure general settings | Filebeat Reference [8.9] | Elastic

I saw in an open issue on Github that can help with the problem, as soon as I have more complete results I'll come back here to report.

Stephani_Ferreira · May 6, 2025, 11:35am

This approach didn't seem to have the effect I had hoped.

The mistakes continued, with a few others now.

Topic		Replies	Views
Need to restart filebeat time to time Beats filebeat	5	453	May 10, 2021
Filebeat not connecting directly to Elasticsearch from particular machine Beats	24	10191	June 27, 2016
FIlebeat failed to connect to backoff(elasticsearch(http://elasticsearch:9200) Beats filebeat	1	1261	November 8, 2019
Failed to start Filebeat sends log files to Logstash or directly to Elasticsearch Beats filebeat	3	1798	August 14, 2022
FileBeat Optimization Beats filebeat	5	1325	November 14, 2016

Restarting Filebeat on AKS Cluster

Related topics