Best practice in generating document ID

Arinto_Murdopo · February 11, 2014, 6:26am

Hi all,

Is there any best practice in generating document ID in ElasticSearch?
Let's say we want to evenly distribute the data in the cluster and be able
to update the document fast.

Let's say my document is a user information with this JSON format, and I
index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as
: {"user_id":123, "name":"arinto"}

Based on simple requirements above, so far I've found 2 possible approaches:

This article (
http://exploringelasticsearch.com/book/advanced-techniques/routing.html)
that mentions that document id should be either UUID or monotonically
increasing to evenly distribute the data in the cluster's shards. That
means I need generate a UUID when indexing new data. But let's say I want
to retrieve the document and update the document with new field or new
data, I could not use 'get' API because the UUID is generated independent
of any document field. Hence I need to use 'search' API, which I assume
perform not as good as 'get' API. (Please correct me if I'm wrong). If all
the fields are indexed, can I improve 'search' API performance to be close
to 'get' API performance?
If let's say I use the "user_id" as the document id, I can easily use
'get' API to retrieve the document, but I'm afraid the document
distribution will not even because the "user_id" is not UUID and not
"monotonically increasing", i.e. sparse values.

Thank you and best regards,

Arinto

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Randall_McRee · February 11, 2014, 6:43am

#2. Its a hash so youll be fine and get is always faster than search. A lot.

Sent from my iPhone

On Feb 10, 2014, at 10:26 PM, Arinto Murdopo arinto@gmail.com wrote:

Hi all,

Is there any best practice in generating document ID in Elasticsearch? Let's say we want to evenly distribute the data in the cluster and be able to update the document fast.

Let's say my document is a user information with this JSON format, and I index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as : {"user_id":123, "name":"arinto"}

Based on simple requirements above, so far I've found 2 possible approaches:
This article (http://exploringelasticsearch.com/book/advanced-techniques/routing.html) that mentions that document id should be either UUID or monotonically increasing to evenly distribute the data in the cluster's shards. That means I need generate a UUID when indexing new data. But let's say I want to retrieve the document and update the document with new field or new data, I could not use 'get' API because the UUID is generated independent of any document field. Hence I need to use 'search' API, which I assume perform not as good as 'get' API. (Please correct me if I'm wrong). If all the fields are indexed, can I improve 'search' API performance to be close to 'get' API performance?
If let's say I use the "user_id" as the document id, I can easily use 'get' API to retrieve the document, but I'm afraid the document distribution will not even because the "user_id" is not UUID and not "monotonically increasing", i.e. sparse values.
Thank you and best regards,

Arinto

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3AF336F0-8136-43CD-94A4-078AC8CBE53D%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Performance concerns on using UUIDv4 generated ID Elasticsearch	6	2969	August 14, 2018
Elastic search _id uuid format Elasticsearch	6	12244	July 5, 2017
What algorithm is ElasticSearch create Document _Id based on?Could somebody answer me，plz Elasticsearch	3	6690	February 28, 2019
Exactly-once guarantee for Spark Structured Streaming Elasticsearch es-hadoop	3	1340	October 21, 2019
How to assign _id field to a document field on create Or fetch the highest document id in a index Elasticsearch	4	8248	October 13, 2017

Best practice in generating document ID

Arinto

Related topics