Performance implications of using mongo id as elastic _id

Hi, i am wondering if there is a performance issue when using a mongodb objectId as the _id in elasticsearch instead of allowing elastic to autogenerate its own.

I am using this to prevent duplicates when syncing, and I'm concerned when i have a lot of data this will come back to haunt me based off the information in this blog:

https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1

if you are using your own ID, try to pick an ID that is friendly to Lucene. Examples include zero-padded sequential IDs, UUID-1, and nanotime; these IDs have consistent, sequential patterns that compress well. In contrast, IDs such as UUID-4 are essentially random and offer poor compression and slow down Lucene.

Should I be generating a more friendly unique id before synchronizing and using that instead? If so, what library is recommended (nodejs)

Regardless of the ID format, providing IDs will effectively add a read to every write because we need to check that the given ID does not already exist whereas with autogenerated IDs we always know they are new. Benchmarking your particular setup is the best way to get a feel for the overhead.

Thanks for the reply. Its not so much the write speed i'm concerned about, but the read speed when i have to do aggregates on the data.

Is this something that may impact that on the querying side or is it only when the document is being created ?

The choice shouldn't impact search

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.