Elasticsearch Performance Issue

Muthukumaran_Muruges · August 6, 2020, 8:34am

Hi All,
We have got a single node cluster of Elasticsearch running 7.5.0.

Disk: 250GB
Memory: 64GB
CPU: 8CPU
Index: 1
Shards : 1
Volume: 2 million
Mapping: Dynamic
Index Structure : PK, {Inner Objects1...N}
Language: All text in Japanese

We use inner objects. So totally there will be only 35K documents but the other records go and update themselves as inner objects in those docs.

It takes hours to run even 100K records using bulk upload API. It is the initial indexing phase so no search happening as of now. We tried the following things but things have not improved.
At index level
1. "number_of_shards": 2,
2. "number_of_replicas" : 0,
3. "refresh_interval" : "-1",
4. "translog.durability" : "ASYNC",
5. "translog.sync_interval": "30s",

At node level
1.thread_pool.write.size: 5
2.thread_pool.write.queue_size: 500
3.indices.memory.index_buffer_size: 30%
4. bootstrap.memory_lock: true

Later we set up a 3 node cluster and tried to load using bulk API again but with disappointing results. Any help on how to optimize for better performance would be great

Christian_Dahlqvist · August 6, 2020, 8:56am

Nested documents are stored as separate documents behind the scenes in Elasticsearch. When you add or update a nested document ALL nested documents need to be reindexed, which can quickly become expensive and is one of the main drawbacks with nested documents.

In your case it sounds like each nested document is updated individually and that each document on average have around 57 nested documents. This means that updating each of the nested documents individually will cause (57+1) * 57 = 3,306 documents to be reindexed behind the scenes as all documents need to be reindexed for every update.

Another issue that is probably hurting performance is that you are updating the same document many times. Updating a single document frequently can quickly lead to poor performance as it can generate many small segments (which is an expensive operation).

My advice would therefore be to group all updates by document and perform one single large update per document rather than many small ones. That should greatly improve your performance. I do not think there is any magical tuning that will greatly improve performance without changing how you handle updates.

Muthukumaran_Muruges · August 6, 2020, 9:03am

Hi Christian,
Thanks for your response. Sure, we could try to consolidate all updates and perform a single update per document. However, just to be clear, we are using inner objects not nested documents which we clearly avoided considering the bottleneck.

Christian_Dahlqvist · August 6, 2020, 9:06am

In that case it may be the frequent updates of individual documents that are hurting performance. As soon as a document is updated which has not yet been flushed to a segment you get a refresh, which basically renders your refresh_interval setting irrelevant.

Muthukumaran_Muruges · August 6, 2020, 9:34am

Ok. Thanks Chrisitian again for your insights. Btw, running bulk api using multiple Curl commands in parallel and hitting same data node has any performance hits?

Christian_Dahlqvist · August 6, 2020, 9:38am

Not if you target different documents.

Muthukumaran_Muruges · August 7, 2020, 5:15am

Big Thanks Christian. We changed the format of bulk load file with one big document with all associated inner objects. It took only few seconds instead of 30-40 mins earlier.

system · September 4, 2020, 5:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk Indexing performance on AWS ES service Elasticsearch	13	2042	November 29, 2017
Single node, large database index performance Elasticsearch	9	581	June 23, 2021
Elasticsearch poor indexing performance Elasticsearch	6	848	December 1, 2017
Indexing performance drops when indexing a lot of nested documents Elasticsearch	2	613	January 10, 2017
Performance Issue on Indexing Elasticsearch	5	479	December 19, 2016

Elasticsearch Performance Issue

Related topics