@A_B as you make the move to Kafka, a few things that will really boost throughput...
- increase
pipeline.batch.size
from the default of 125 to at least 1024 (1280 was best in my environment) - increase
pipeline.batch.delay
from the default of 50 to at least 500 (1000 was best in my environment) - in the
kafka
input, setmax_poll_records
to the same value aspipeline.batch.size
- each thread defined by
consumer_threads
in thekafka
input will be an instance of a consumer. So if you have 4 instances with 2 threads, that is 8 consumer instances. Your Kafka topics must have at least 8 partitions for all consumer threads to ingest data. You will want more partitions than your current needs so you can easily scale in the future. - the number of
pipeline.workers
should be at least equal toconsumer_threads
. - the kafka output should set
batch_size
to at least 16384
You may end up tweaking some of the buffer settings as well, but the above will give you a good starting point.
Rob
How to install Elasticsearch & Kibana on Ubuntu - incl. hardware recommendations
What is the best storage technology for Elasticsearch?