First of all, I am very excited about this Ingest Node feature, as this is exactly what i need, as logstash is looking too costly just for parsing the logs sent by filebeat and push to Elasticsearch in F-E-L-K Stack. But as far as i know there is a maximum limit of indexing threads and is fixed to no of processors and queue is fixed at 200 and just 50 if its bulk indexing operations. In my use case i am planning to install filebeat on over thousand nodes, and currently three logstash servers are successfully handling that much load and pushing data into a two node elasticsearch. But if we use this ingest node feature and remove logstash from the stack. How two node elasticsearch cluster is going to handle 1000 filebeat connections?. How this maximum indexing threads problem is solved in this method?
What I've seen so far is that when using ingest most of the time remains to be spent on the indexing part. So if your current two node cluster is handling the load well then I don't expect the bulk threadpool to get exhausted by moving to node ingest. However this depends on the load LS has currently and what kind of pipelines you have now.
I would gradually move from LS to node ingest (a couple of filebeat instances at the time) and each time see how the cluster is handling this.
If you do need more capacity than you can always add an ingest only node to the cluster (by setting node.master: false
and node.data: false
). Also make sure that all your nodes have been configured in the ES output as host to connect to.
Currently Just two node elasticsearch is handling the load because , even though i have 1000+ filebeat connections, elasticsearch is facing only 3 logstash servers which are using 3*16 threads running parallelly to ingest data into elasticsearch. But as soon as i remove logstash and make my filebeat ingest data directly to elasticsearch, there will be atleast 1000+ threads running parallelly and trying to push data to elasticsearch. And how elasticsearch is going to handle that?. Does it will queue some 50 threads and stop accepting connections until it process the queued threads?.
If your ES cluster can't handle the load any more (all bulk threads are busy and the queue is full) then nodes will return 429 http response code. It will continue to return that until requests from the queue removed and processed. Filebeat will retry the send the same batch of logs until ES has successfully processed that and then continue were it left off and send the next batch of logs to ES.
So, There could be some time lag, due to heavy load But not any data loss. That solves the issue for me. Will it be better to increase the queue size to some 1500, so that the threads will be queued instead of elasticsearch sending 429 response?.
No, I wouldn't do that, it just increases memory usage and doesn't buy you anything as Filebeat just retries failed logs before continuing with next logs.