Hi all,
Is there any best practice in generating document ID in ElasticSearch?
Let's say we want to evenly distribute the data in the cluster and be able
to update the document fast.
Let's say my document is a user information with this JSON format, and I
index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as
: {"user_id":123, "name":"arinto"}
Based on simple requirements above, so far I've found 2 possible approaches:
- This article (
http://exploringelasticsearch.com/book/advanced-techniques/routing.html)
that mentions that document id should be either UUID or monotonically
increasing to evenly distribute the data in the cluster's shards. That
means I need generate a UUID when indexing new data. But let's say I want
to retrieve the document and update the document with new field or new
data, I could not use 'get' API because the UUID is generated independent
of any document field. Hence I need to use 'search' API, which I assume
perform not as good as 'get' API. (Please correct me if I'm wrong). If all
the fields are indexed, can I improve 'search' API performance to be close
to 'get' API performance?
- If let's say I use the "user_id" as the document id, I can easily use
'get' API to retrieve the document, but I'm afraid the document
distribution will not even because the "user_id" is not UUID and not
"monotonically increasing", i.e. sparse values.
Thank you and best regards,
Arinto
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
#2. Its a hash so youll be fine and get is always faster than search. A lot.
Sent from my iPhone
On Feb 10, 2014, at 10:26 PM, Arinto Murdopo arinto@gmail.com wrote:
Hi all,
Is there any best practice in generating document ID in Elasticsearch? Let's say we want to evenly distribute the data in the cluster and be able to update the document fast.
Let's say my document is a user information with this JSON format, and I index all the fields.
{"user_id":someLongValue, "name":someStringValue}, such as : {"user_id":123, "name":"arinto"}
Based on simple requirements above, so far I've found 2 possible approaches:
This article (http://exploringelasticsearch.com/book/advanced-techniques/routing.html) that mentions that document id should be either UUID or monotonically increasing to evenly distribute the data in the cluster's shards. That means I need generate a UUID when indexing new data. But let's say I want to retrieve the document and update the document with new field or new data, I could not use 'get' API because the UUID is generated independent of any document field. Hence I need to use 'search' API, which I assume perform not as good as 'get' API. (Please correct me if I'm wrong). If all the fields are indexed, can I improve 'search' API performance to be close to 'get' API performance?
If let's say I use the "user_id" as the document id, I can easily use 'get' API to retrieve the document, but I'm afraid the document distribution will not even because the "user_id" is not UUID and not "monotonically increasing", i.e. sparse values.
Thank you and best regards,
Arinto
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14eaa93e-0690-47e0-af9c-d8d84bdb59fd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3AF336F0-8136-43CD-94A4-078AC8CBE53D%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.