Does enabling hash on datatype ip - speed up response on cardinality aggr?

Anindya_Roy · April 12, 2019, 4:41pm

It makes sense to turn on hash on fields whose datatype is string, to speed up response time of cardinality aggr as per https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-murmur3.html

Wondering if there will be any performance improvement by turning on hash on the field whose datatype is 'ip' ? ( ip addresses )
Wondering if distinct ip count is already treated as a integer.

Also, does the hash performed on the field represents a running total on the distinct values for that field seen from beginning of data in the index ?
E.g. say we saw 100 distinct value for a particular field in the first hour and then another 50 distinct values in the next hour. When I run cardinality aggr query with a range filter to query the last 30 mins, i only expect to see 50 and not 150 + 50 = 200.

Anindya_Roy · April 15, 2019, 6:22pm

my apologies if this is a repeat question that may have been discussed in the past.
Any comments or thoughts @Mark_Harwood ?

Mark_Harwood · April 15, 2019, 8:21pm

The docs there give the example of long strings and I see that might apply to the use case of finding exact duplicates of texts. IP addresses are much smaller and so less likely to benefit from this technique.

No.

Anindya_Roy · April 15, 2019, 9:21pm

Thanks a lot

system · May 13, 2019, 9:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.