I've given the issue some thoughts and came to the idea of temporary blocking any incoming data so it wouldn't trigger any ingest or bulk operation. I have a few concerns:
I don't want to lose any data coming from the filebeat clients because the elastic search server was unavailable for too long.
If possible, I don't want to touch the firewall settings.
I thought about removing the data and ingest role to the node? Is it a viable way to do that?
If you are using Logstash to collect logs, have you tried working with Persistent Queues? IIUC your use case can be helped by using a pre-calculated size for the queue at logstash and back-pressure filebeat when the queue is full. The filebeat logs will only be accepted when the queue is not filled again. I was able to work through a spike in incoming data using this flow.
Let me explain how I tackled this. However it would be highly likely there's a better way of achieving the same thing (provided I understood your problem correctly). This worked for me
So the way log ingestion happens in this specific deployment was through log file -> filebeat -> logstash -> elasticsearch. Logstash is used to throttle, parse, and enrich log entries. In my case it was a given that spikes could occur during the log ingestion.
Persistent Queues seems to be the Elastic recommended way to buffer such spikes [1]. With Persistent Queues enabled, logstash will write all incoming logs from filebeat to a file based queue, and process them in order. If the queue fills up (you can define the size on disk or the number of events in flight as a limit) logstash will stop accepting events from filebeat until the queue becomes free again.
Enabling persistent queues is pretty easy. Refer to the guide for that [2].
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.