How to slow down large amount of data coming from filebeat?

@A_B as you make the move to Kafka, a few things that will really boost throughput...

  1. increase pipeline.batch.size from the default of 125 to at least 1024 (1280 was best in my environment)
  2. increase pipeline.batch.delay from the default of 50 to at least 500 (1000 was best in my environment)
  3. in the kafka input, set max_poll_records to the same value as pipeline.batch.size
  4. each thread defined by consumer_threads in the kafka input will be an instance of a consumer. So if you have 4 instances with 2 threads, that is 8 consumer instances. Your Kafka topics must have at least 8 partitions for all consumer threads to ingest data. You will want more partitions than your current needs so you can easily scale in the future.
  5. the number of pipeline.workers should be at least equal to consumer_threads.
  6. the kafka output should set batch_size to at least 16384

You may end up tweaking some of the buffer settings as well, but the above will give you a good starting point.

Rob

GitHub YouTube LinkedIn
How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?

1 Like