We are experiencing a delay in logs being ingested into the Elasticsearch. While investigating we are getting below mentioned Error in Elasticsearch logs.
Logs are being ingested from the application via Filebeat, processed through Logstash, and then sent to Elasticsearch.
[2024-10-2318:58:54,970][ERROR][o.e..m.c.c.ClusterStatsCollector] [yball1v355ca09] collector [cluster_stats] timed out when collecting data: node [OA1DKLJoT4yfNNW-p9kkJg] did not respond within [10s]
The error message you’re seeing indicates that the Elasticsearch node is timing out when attempting to collect cluster statistics, which can affect the ingestion of logs. Here are some steps to help you troubleshoot and resolve this issue:
1. Check Node Health
Use the _cat/nodes API to check the health of your Elasticsearch nodes:
bash
Copy code
GET /_cat/nodes?v
Ensure that all nodes are up and running and that there are no nodes marked as red or yellow.
2. Monitor Resource Usage
Check the resource usage (CPU, memory, disk I/O) on the Elasticsearch nodes. High usage could lead to timeouts.
Use the following command to view the node stats:
bash
Copy code
GET /_nodes/stats
3. Increase Timeout Settings
If the timeout is due to temporary load spikes, consider increasing the timeout for cluster stats collection. You can do this by adjusting the following settings:
yaml
Copy code
cluster.stats.timeout: 30s
Update your elasticsearch.yml configuration file and restart the node to apply the changes.
4. Check Logstash Performance
Since logs are being processed through Logstash, monitor its performance and resource usage. If Logstash is slow, it can back up the log ingestion pipeline.
Ensure that Logstash is configured correctly and that it can handle the volume of logs being ingested.
5. Review Filebeat Configuration
Check the Filebeat configuration for any issues that could be causing delays. Ensure that it is configured to send logs efficiently.
Consider increasing the bulk_max_size setting in Filebeat to optimize the volume of data sent to Logstash.
6. Cluster Configuration
Review your Elasticsearch cluster configuration for any potential bottlenecks, such as insufficient node resources or improper shard allocation.
Consider adjusting the number of shards or replicas if the index is heavily loaded.
7. Logs Analysis
Examine the Elasticsearch logs for any additional error messages or warnings that might provide further context on the issue.
Look for logs indicating garbage collection or other resource-related issues.
8. Upgrade Considerations
If you are running an older version of Elasticsearch or the Elastic stack components, consider upgrading to the latest stable version to benefit from performance improvements and bug fixes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.