Highest ingestion speed

ktomu · June 13, 2022, 11:06am

Hi, I have a general architectical question
I'm going to use Elasticsearch in the way of highest ingestion speed of data. I don't care about data save, the most important is throughput of data ingestion.
I have 100k new lines per second in my log file and I have to achieve delay not more than 2-3 seconds (log file -> logstash -> indexed in elk).
Would it be worth if create many datanode with replication factor to 0? Can Increasing of data nodes with replication factor 0 in the same time help me to increase a throughput of ingestion?

Jugsofbeer · June 13, 2022, 11:14am

Is it timeseries data? why does it have to be no more tham 2-3 seconds delay?

decent nvme storage should give you 500000 writes easily. but testing is the fun part

Christian_Dahlqvist · June 13, 2022, 11:39am

Is it 100k lines per second in a single log file? If that is the case your bottleneck may very well be tailing, parsing and reformatting the data before sending it to Elasticsearch as this is unlikely to scale linearly.

With respect to scaling Elasticsearch, having multiple indexing nodes with a suitable primary shard count and no replicas will give the best throughput at the cost of resiliency and availability. You will need to make sure you are feeding it using parallel bulk requests in order to get the best out of it though, which depends on the design of your data processing pipeline.

system · July 11, 2022, 11:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Improving Elasticsearach ingest capacity Elasticsearch	7	104	June 20, 2024
Slow Elasticsearch Ingestion Elasticsearch	6	418	August 26, 2020
Ingestion performance issues - where to start? Elasticsearch	6	648	September 18, 2020
Elasticsearch bulk Ingestion Elasticsearch	4	349	May 19, 2021
Huge concurrent data ingestion to ElasticSearch Elasticsearch	16	2821	September 18, 2018

Highest ingestion speed

Related Topics