Updating documents when using auto-generated IDs

jerrymj · July 28, 2020, 5:25am

Hi,

I have an index running on 6.3. It is about 11MM documents, and since the documents are nested, they are much larger in count. These documents represent information stored in multiple databases relating to a specific id (a long int). Since it is unique, that is provided as the _id for the document as well.

I am looking at ways to tune the indexing for this. One thing that caught my eye when reading the guide was to use autogenerated iDs.

Here is my concern.

Would unique IDs such as what I have provided be enough or does the autogenerated ID have provide something much better?
If I were to use autogenerated IDs, how will I update the document using the Update API? I don't have the ID anymore (since that's autogenerated). And I wasn't intending to store the autogenerated IDs in any other database / datastore.
Thinking aloud on the second point, should I be making a retrieval call on Elasticsearch just to check for the autogenerated ID and "upsert" accordingly?

Regards,
Jerry

warkolm · July 28, 2020, 6:11am

When you say tuning the index, what exactly are you trying to achieve, or what problem do you have that you are looking to solve?

Christian_Dahlqvist · July 28, 2020, 8:03am

Updating large nested documents can be expensive irrespective of what type of ID you are using as all nested documents need to be reindexed behind the scenes. Using auto generated ids can speed up indexed no of immutable documents but do not help when updating.

jerrymj · July 29, 2020, 9:37am

By tuning the index, I was looking for ways on how

I can index the document faster.
How I can consume less resources on indexing.

The above stated is what I really want to solve (all without changing the document structure - size, deep nested .. at least for now)
So as part of that investigation, I assume everything listed in the Elasticsearch suggestions can be made applicable / tried out.

The reason for my post in the forum is if you make use of autogenerated IDs for indexing, how do you update the document for reflecting updates on the database? (Elasticsearch holds data from DB)

Christian_Dahlqvist · July 29, 2020, 9:39am

If you update data using auto generated IDs does not make sense and will not bring any benefit.

jerrymj · July 29, 2020, 9:47am

Thank you, that helps clarify my main question.

However, your response brings me to a couple of queries.

"all nested documents need to be reindexed behind the scenes". - I would be happy to read more on this as I am interested to know how costly this operation is.
Is it possible to use Update API in a case when IDs are auto-generated as an Upsert Operation? (this is me being curious). The Update API requires you to pass in the ID for the document you wish to update. AFAIK, you would need to make a GET call to elasticseach and then using the ID from the response, make the UPDATE call.

Christian_Dahlqvist · July 29, 2020, 9:52am

The more nested documents and levels, the more costly it is. It can slow down indexing significantly and is the main reason I do not think you will see much improvement from the standard tuning steps, which are largely aimed at immutable, non-nested documents.

The performance impact will be even greater if you update the same document frequently as this can result in a lot of very small segments.

With auto generated IDs you do indeed need to search before updating which is why it does not make any sense for your use case.

system · August 26, 2020, 9:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upsert a document when document id is autogenerated using upsert? Elasticsearch language-clients	3	379	January 4, 2024
How to deal with non-auto generated ids while index Elasticsearch reindex	4	511	January 11, 2021
Auto generated auto-incremented document id in numeric form Elasticsearch	4	2930	July 23, 2019
Elasticsearch Configuring _id autogeneration function Elasticsearch	2	615	August 7, 2019
Autogenerated IDs settings Elasticsearch	2	342	September 28, 2019

Updating documents when using auto-generated IDs

Related topics