Does enabling hash on datatype ip - speed up response on cardinality aggr?


(Anindya Roy) #1

It makes sense to turn on hash on fields whose datatype is string, to speed up response time of cardinality aggr as per https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-murmur3.html

Wondering if there will be any performance improvement by turning on hash on the field whose datatype is 'ip' ? ( ip addresses )
Wondering if distinct ip count is already treated as a integer.

Also, does the hash performed on the field represents a running total on the distinct values for that field seen from beginning of data in the index ?
E.g. say we saw 100 distinct value for a particular field in the first hour and then another 50 distinct values in the next hour. When I run cardinality aggr query with a range filter to query the last 30 mins, i only expect to see 50 and not 150 + 50 = 200.


(Anindya Roy) #2

my apologies if this is a repeat question that may have been discussed in the past.
Any comments or thoughts @Mark_Harwood ?


(Mark Harwood) #3

The docs there give the example of long strings and I see that might apply to the use case of finding exact duplicates of texts. IP addresses are much smaller and so less likely to benefit from this technique.

No.


(Anindya Roy) #4

Thanks a lot