Best practice in generating document ID


(Arinto Murdopo) #1

Hi all,

Is there any best practice in generating document ID in ElasticSearch?
Let's say we want to evenly distribute the data in the cluster and be able
to update the document fast.

Let's say my document is a user information with this JSON format, and I
index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as
: {"user_id":123, "name":"arinto"}

Based on simple requirements above, so far I've found 2 possible approaches:

  1. This article (
    http://exploringelasticsearch.com/book/advanced-techniques/routing.html)
    that mentions that document id should be either UUID or monotonically
    increasing to evenly distribute the data in the cluster's shards. That
    means I need generate a UUID when indexing new data. But let's say I want
    to retrieve the document and update the document with new field or new
    data, I could not use 'get' API because the UUID is generated independent
    of any document field. Hence I need to use 'search' API, which I assume
    perform not as good as 'get' API. (Please correct me if I'm wrong). If all
    the fields are indexed, can I improve 'search' API performance to be close
    to 'get' API performance?
  2. If let's say I use the "user_id" as the document id, I can easily use
    'get' API to retrieve the document, but I'm afraid the document
    distribution will not even because the "user_id" is not UUID and not
    "monotonically increasing", i.e. sparse values.

Thank you and best regards,

Arinto

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Randall McRee) #2

#2. Its a hash so youll be fine and get is always faster than search. A lot.

Sent from my iPhone

On Feb 10, 2014, at 10:26 PM, Arinto Murdopo arinto@gmail.com wrote:

Hi all,

Is there any best practice in generating document ID in ElasticSearch? Let's say we want to evenly distribute the data in the cluster and be able to update the document fast.

Let's say my document is a user information with this JSON format, and I index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as : {"user_id":123, "name":"arinto"}

Based on simple requirements above, so far I've found 2 possible approaches:
This article (http://exploringelasticsearch.com/book/advanced-techniques/routing.html) that mentions that document id should be either UUID or monotonically increasing to evenly distribute the data in the cluster's shards. That means I need generate a UUID when indexing new data. But let's say I want to retrieve the document and update the document with new field or new data, I could not use 'get' API because the UUID is generated independent of any document field. Hence I need to use 'search' API, which I assume perform not as good as 'get' API. (Please correct me if I'm wrong). If all the fields are indexed, can I improve 'search' API performance to be close to 'get' API performance?
If let's say I use the "user_id" as the document id, I can easily use 'get' API to retrieve the document, but I'm afraid the document distribution will not even because the "user_id" is not UUID and not "monotonically increasing", i.e. sparse values.
Thank you and best regards,

Arinto

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3AF336F0-8136-43CD-94A4-078AC8CBE53D%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3