Improving Elasticsearach ingest capacity

grazingelk · May 23, 2024, 8:16am

Hi all,

I have a self-hosted

Kafka -> Logstash -> Elasticsearch setup.

Logstash doesn't seem to be the bottleneck. By sending the LS output to /dev/null, I was able to achieve an average rate of 97,000 documents per second.

When I replace the output with Elasticsearch it drops to an average of 20,000/s for a short period.

I have tried building up to find a limit. First increasing the shards, then the indices.

Right now on a single dedicated data node with 8 CPU, 32G RAM and a heap of 16G, I can't go over 25k/s regardless of how many or how few shards and indices I have.

Examples:

Ingesting 3 million documents (logs) with a single index with 2 shards and no replicas takes on average 20k/s
Ingesting 3 million documents with 2 indices and 2 shards each and still no replicas takes on average 23k/s

With a larger sample it breaks down

Ingesting 30 million documents with 2 indices and 2 shards each and still no replicas takes on average 13k/s
Ingesting 30 million documents with 6 indices each having 20 shards and still no replicas takes on average 14.4k/s

I was able to improve Logstash input by tweaking pipline workers and pipeline.batch.size but I have found nothing other than indices/shards/replicas for Elasticsearch.

What am I missing? Can you help?

Many thanks.

Christian_Dahlqvist · May 23, 2024, 8:21am

One of the most important factors for ingest throughput is the performance of the storage, which you have not mentioned at all. This is discussed in the documentation around tuning for indexing speed and having local, fast SSDs is recommended. What type of storage are you using?

grazingelk · May 23, 2024, 8:26am

I was a bit quick to post it seems as that is exactly what I have replaced right now. The storage is at aws. We had a standard GP3 with 125M/s throughput and 3000 IOPS. I have just doubled the storage throughput and IOPS and I am seeing an improvement. I will run some more tests and leave my results here for anyone else who might need them.

Thanks for the link to the doc, I will read it now.

Christian_Dahlqvist · May 23, 2024, 8:29am

That is still way off the speed you get with a local SSD, so probably still qualifies as slow storage.

With this type of storage I would recommend indexing to as few shards as possible as that results in fewer larger writes. Also be aware that if you are specifying your own document IDs, each indexing operation will need to be treated as a potential update which will reduce throughput and make indexing slow down as indices grow in size.

grazingelk · May 23, 2024, 9:13am

AWS EBS GP3 default storage is indeed very slow. It is ok for most apps in our cluster but clearly not enough for ES.

Just by doubling the Throughput and IOPS I was able to achieve 28k/s which is more than double what I was able to do on the previous run and this with only 2 indices and 2 shards each.

@Christian_Dahlqvist I will take your advice and limit the number of shards and see how that goes. In prod we generate ~350 million log lines a day and we need to 10x that for the future so it looks like our costs will come from sizing the storage to the needs of ES.

Many thanks for your help!

Christian_Dahlqvist · May 23, 2024, 9:19am

If you are ingesting log data you may want to look at data streams and hot-warm architecture. In such architecture you have a number of nodes with very fast storage, e.g. i4i instances or similar, which handle all indexing. The warm tier then holds indices once they are no longer written to and can have slower hardware while still being able to serve queries efficiently.

grazingelk · May 23, 2024, 9:33am

I was under the impression that data streams were more for data such as metrics and not logs where you had to track values. Back to reading on data streams I guess.

As for warm tiers I would guess it's down to mounting different storage specs based on the tier. I'll read up on that too.

Looking at the i4i family, I see they have local SSD which I am guessing is far more performant than my GP3 storage and better suited for a data nodes.

Thank you so much for your help @Christian_Dahlqvist

system · June 20, 2024, 9:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to gain ingestion rate Elasticsearch	15	5978	July 5, 2017
Storage capacity (planning)? Elasticsearch	3	1423	July 6, 2017
How to Increase Indexing rate Elasticsearch	10	4013	July 5, 2017
Elasticsearch/Logstash low indexing rate Elasticsearch	6	485	July 13, 2020
Bulk Insert Throughput Issues Elasticsearch	2	330	July 6, 2017

Improving Elasticsearach ingest capacity

Related topics