Performance impact of using a string of length 100 characters as _Id column in Elastic Search


I am planning to store events in elastic search. It can have around 100 million events at any point time. To de-dupe events, I am planning to create _id column of length 100 chars by concatenating below fields
entity_id - UUID (37 chars) +
event_creation_time (30 chars) +
event_type (30 chars)

This store will be having normal reads & writes along with aggregate queries (no updates / deletes)
Can you please let me know if there would be any performance impact or any other side-effects of using such lengthy string _id columns instead of default Ids.


This will help

A very long uid field is not great, especially if the uids can share a large common prefix: it will slow down the uid lookup that ES must do for every document you index or delete.

Thanks for your replies. Then is there any way of handling de-dupes on multiple columns during inserts?

One alternative that might be worth exploring/benchmarking is to use a hash of entity_id+event_creation_time+event_type. Or, if keeping them sequential is helpful, you could do event_creation_time+hash(entity_id+event_type). Also, I'd imagine those 30 chars for the timestamp could be expressed more compactly.