Performance impact of using a string of length 100 characters as _Id column in Elastic Search

Harish_Kommaraju · January 3, 2016, 3:01pm

Hi,

I am planning to store events in elastic search. It can have around 100 million events at any point time. To de-dupe events, I am planning to create _id column of length 100 chars by concatenating below fields
entity_id - UUID (37 chars) +
event_creation_time (30 chars) +
event_type (30 chars)

This store will be having normal reads & writes along with aggregate queries (no updates / deletes)
Can you please let me know if there would be any performance impact or any other side-effects of using such lengthy string _id columns instead of default Ids.

Thanks,
Harish

warkolm · January 3, 2016, 10:37pm

This will help http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html

mikemccand · January 4, 2016, 11:36am

A very long uid field is not great, especially if the uids can share a large common prefix: it will slow down the uid lookup that ES must do for every document you index or delete.

Harish_Kommaraju · January 5, 2016, 3:10pm

Thanks for your replies. Then is there any way of handling de-dupes on multiple columns during inserts?

loren · January 5, 2016, 5:48pm

One alternative that might be worth exploring/benchmarking is to use a hash of entity_id+event_creation_time+event_type. Or, if keeping them sequential is helpful, you could do event_creation_time+hash(entity_id+event_type). Also, I'd imagine those 30 chars for the timestamp could be expressed more compactly.

Topic		Replies	Views
Performance considerations on uid generation Elasticsearch	2	405	July 5, 2019
Long _id (so _uid) field performance issues? Elasticsearch	1	353	July 6, 2017
Performance concerns on using UUIDv4 generated ID Elasticsearch	6	3041	August 14, 2018
Which of these will be fastest way of querying my data Elasticsearch	1	364	July 6, 2017
Unanalyzed string vs long on es 5.0 Elasticsearch	6	597	December 9, 2016

Performance impact of using a string of length 100 characters as _Id column in Elastic Search

Related topics