My cluster stopped ingesting due to low disk space but beats did not seem to recover all missed events

netfire · December 10, 2020, 2:35am

Newbie here. I had an event the other day due to not minding my index lifecycle policies. The cluster went yellow as there were unallocated shards (seemed like shards had moved completely off the first node to hit the disk space high mark). Indexing/ingestion was stopped completely. I restarted the entire cluster, deleted some old time-based indices, and fixed my ILM policies. After everything re-balanced and the cluster was green, I noticed that it seems like not all events from winlogbeat were recovered. I spot checked a local log, and the beat had paused due to the low disk space. The logs were spammed with failed to publish events: temporary bulk send failure. After I cleared the issue, the beats started pushing events again, and the index rate made it seem like they had recovered missed events. However, when I graph the events created vs timestamp per hour, it seems like many events were still missed during this period. The only events entering my cluster at this point are from beats.

Here is event.created vs @timestamp:

Here is my indexing rate over time:

And here is a closeup showing the spike, then lowering of the index rate back down to normal for us:

I was under the impression that the beats (winlogbeat and auditbeat on windows at this point only) would be able to recover from where they left off. Is there anything I should know about how things should behave in a situation such as this?

Thanks

netfire · December 22, 2020, 6:52am

Bump. Any ideas?

system · January 19, 2021, 8:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Winlogbeat configuration for index rollover Beats winlogbeat	2	132	May 9, 2024
Winlogbeat missing events Beats winlogbeat	3	840	July 21, 2020
Cluster flooded with random date index Elasticsearch	7	726	April 15, 2020
Auditbeat 7.16.2 crashes servers due to memory issues Beats auditbeat	2	607	February 21, 2022
Error in Winlogbeat logs Beats winlogbeat	11	2337	August 16, 2018

My cluster stopped ingesting due to low disk space but beats did not seem to recover all missed events

Related topics