What algorithm is ElasticSearch create Document _Id based on?Could somebody answer me，plz

ennian001 · January 30, 2019, 9:24am

the _id is UUID?
If I want to specify my own ID, according to the formula（shard = hash(routing) % number_of_primary_shards）, will it result in uneven sharding?
I'm confused about this. Can someone help me? plz, i'd really appreciate it .

DavidTurner · January 30, 2019, 10:27am

The code that generates auto-generated IDs is below.

github.com

elastic/elasticsearch/blob/99f88f15c5febbca2d13b5b5fda27b844153bf1a/server/src/main/java/org/elasticsearch/common/TimeBasedUUIDGenerator.java#L80-L118


// We have auto-generated ids, which are usually used for append-only workloads.
// So we try to optimize the order of bytes for indexing speed (by having quite
// unique bytes close to the beginning of the ids so that sorting is fast) and
// compression (by making sure we share common prefixes between enough ids),
// but not necessarily for lookup speed (by having the leading bytes identify
// segments whenever possible)


// Blocks in the block tree have between 25 and 48 terms. So all prefixes that
// are shared by ~30 terms should be well compressed. I first tried putting the
// two lower bytes of the sequence id in the beginning of the id, but compression
// is only triggered when you have at least 30*2^16 ~= 2M documents in a segment,
// which is already quite large. So instead, we are putting the 1st and 3rd byte
// of the sequence number so that compression starts to be triggered with smaller
// segment sizes and still gives pretty good indexing speed. We use the sequenceId
// rather than the timestamp because the distribution of the timestamp depends too
// much on the indexing rate, so it is less reliable.


uuidBytes[i++] = (byte) sequenceId;
// changes every 65k docs, so potentially every second if you have a steady indexing rate
uuidBytes[i++] = (byte) (sequenceId >>> 16);

This file has been truncated. show original

ennian001 · January 31, 2019, 2:56am

Thank you , you are very kind....you're really help me a lot...

system · February 28, 2019, 3:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does Elasticsearch map Integer doc IDs to shards Elasticsearch	8	1231	February 14, 2021
Performance implications of using mongo id as elastic _id Elasticsearch	4	976	June 27, 2018
Autogenerate id of IndexRequest before sending it Elasticsearch	2	742	May 1, 2019
Performance concerns on using UUIDv4 generated ID Elasticsearch	6	3041	August 14, 2018
Best practice in generating document ID Elasticsearch	2	9883	July 6, 2017

What algorithm is ElasticSearch create Document _Id based on?Could somebody answer me，plz

Related topics