Performance implications of using mongo id as elastic _id

Luke_Rossy · May 29, 2018, 6:01pm

Hi, i am wondering if there is a performance issue when using a mongodb objectId as the _id in elasticsearch instead of allowing elastic to autogenerate its own.

I am using this to prevent duplicates when syncing, and I'm concerned when i have a lot of data this will come back to haunt me based off the information in this blog:

https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1

if you are using your own ID, try to pick an ID that is friendly to Lucene. Examples include zero-padded sequential IDs, UUID-1, and nanotime; these IDs have consistent, sequential patterns that compress well. In contrast, IDs such as UUID-4 are essentially random and offer poor compression and slow down Lucene.

Should I be generating a more friendly unique id before synchronizing and using that instead? If so, what library is recommended (nodejs)

Mark_Harwood · May 30, 2018, 2:41pm

Regardless of the ID format, providing IDs will effectively add a read to every write because we need to check that the given ID does not already exist whereas with autogenerated IDs we always know they are new. Benchmarking your particular setup is the best way to get a feel for the overhead.

Luke_Rossy · May 30, 2018, 4:22pm

Thanks for the reply. Its not so much the write speed i'm concerned about, but the read speed when i have to do aggregates on the data.

Is this something that may impact that on the querying side or is it only when the document is being created ?

Mark_Harwood · May 30, 2018, 4:24pm

The choice shouldn't impact search

system · June 27, 2018, 4:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance concerns on using UUIDv4 generated ID Elasticsearch	6	3041	August 14, 2018
What algorithm is ElasticSearch create Document _Id based on?Could somebody answer me，plz Elasticsearch	3	6775	February 28, 2019
Using Custom value for _id instead of es generated value Elasticsearch	2	6503	September 8, 2017
Bad bulk performance with self-generated id Elasticsearch	17	3343	November 9, 2017
Lucene hates UUID v4. A real issue or a myth? Elasticsearch	5	4903	July 5, 2017

Performance implications of using mongo id as elastic _id

Related topics