How many processes can update the same index in parallel?

Dmitry_Reshetnik · May 14, 2021, 5:22pm

I have an index in ES that contains 50-100 fields.
I'm getting all those fields from 10-15 different data sources (as partial documents).
For each data source I want to configure an indexer process(es) that will update the index in ES with the new data.

Are there any limitations on parallel indexing for ES?
Can I have for example 5 instances of each indexer (50-100 instances working in parallel within the same index)?
How to calculate the capacity of the index?
Is that possible to corrupt the index data by having too many indexers in parallel?

Christian_Dahlqvist · May 15, 2021, 6:14am

How large is the data set in terms of document count?

How frequently is each individual document likely to be updated?

Are you going to use bulk updates?

Which version of Elasticsearch are you using?

Are you updating individual documents directly by ID?

Dmitry_Reshetnik · May 15, 2021, 10:56am

250-500M

permanently, but the periodicity is different. some parts will be updated permanently, some of them once per day. over the 24h there is a low chance that more than 10M documents will be updated.

low chance because I'm using Kafka as a source of item changes.

7.6.1 with Lucene 8.4

yes

Christian_Dahlqvist · May 15, 2021, 11:22am

Does this mean several updates per second? If not, what periodicity does it translate to?

I do not have a lot of recent experience with high-update use cases, but can provide some pointers based on what I have seen here in the forum.

Updating documents without using bulk requests can lead to a lot small segments being generated, which is inefficient and can have huge negative impact on performance. I believe some improvements might have been made in later versions so would recommend that you upgrade to the latest version.
Elasticsearch have never been optimised for very frequent updates, so if you have e.g. counters that are updated several times per second you may be better off aggregating these outside Elasticsearch and updating periodically.
I do not think you will see corruption if you have highly concurrent updates, but throughput may be poor and you might get errors due to version conflicts. You need to benchmark to tell for sure as it is a somewhat unusual use case.

Dmitry_Reshetnik · May 15, 2021, 11:36am

Thank you for the response. We had it working in Azure Search with 9-12 concurrent permanent indexers. The fastest stream was processed in 5 single thread processes with the constant updates queue.
Now we are looking to change the scaling approach that may lead to more indexers in parallel.

Christian_Dahlqvist · May 15, 2021, 1:31pm

What did that setup look like? How many indices and shards? What throughput did you get to? What is now you target?

system · June 12, 2021, 1:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Updating only a few fields out of many Elasticsearch	4	370	November 21, 2023
ElasticSearch and parallelization Elasticsearch	2	2026	July 6, 2017
Bulk update is too slow elasticsearch 6.2 Elasticsearch	25	6828	June 4, 2018
ES Indexing take huge time Elasticsearch	6	1630	July 5, 2017
Elasticsearch poor indexing performance Elasticsearch	6	848	December 1, 2017

How many processes can update the same index in parallel?

Related topics