ennian001
(Ennian001)
January 30, 2019, 9:24am
1
the _id is UUID?
If I want to specify my own ID, according to the formula(shard = hash(routing) % number_of_primary_shards), will it result in uneven sharding?
I'm confused about this. Can someone help me? plz, i'd really appreciate it .
The code that generates auto-generated IDs is below.
// We have auto-generated ids, which are usually used for append-only workloads.
// So we try to optimize the order of bytes for indexing speed (by having quite
// unique bytes close to the beginning of the ids so that sorting is fast) and
// compression (by making sure we share common prefixes between enough ids),
// but not necessarily for lookup speed (by having the leading bytes identify
// segments whenever possible)
// Blocks in the block tree have between 25 and 48 terms. So all prefixes that
// are shared by ~30 terms should be well compressed. I first tried putting the
// two lower bytes of the sequence id in the beginning of the id, but compression
// is only triggered when you have at least 30*2^16 ~= 2M documents in a segment,
// which is already quite large. So instead, we are putting the 1st and 3rd byte
// of the sequence number so that compression starts to be triggered with smaller
// segment sizes and still gives pretty good indexing speed. We use the sequenceId
// rather than the timestamp because the distribution of the timestamp depends too
// much on the indexing rate, so it is less reliable.
uuidBytes[i++] = (byte) sequenceId;
// changes every 65k docs, so potentially every second if you have a steady indexing rate
uuidBytes[i++] = (byte) (sequenceId >>> 16);
This file has been truncated. show original
1 Like
ennian001
(Ennian001)
January 31, 2019, 2:56am
3
system
(system)
Closed
February 28, 2019, 3:08am
4
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.