Terms aggregations on hashcodes (Murmur3FieldMapper)

photonic_world_2 · July 25, 2016, 7:01pm

Hello,

I have an index (4 shards and 1 replica each) which has high ingestion rate (5k per s) distributed among 5 data nodes.

The problem I see is that when a terms aggregation is run on this index, the response time is very high.

Mapping:
"recipients": { "type": "string", "index": "not_analyzed", "fields": { "hash": { "type": "murmur3" } } },

Aggregation query

"aggregations": { "bucket_agg": { "terms": { "field": "recipients.hash", "size": 5, "shard_size": 0, } } }

But aggregating on hash code is way faster, which is expected as it hashcode fields are not strings and do not need global ordinals to be updated in field data on heavily indexed index.

My problem is that if I query on hash code the value returned by the aggregate is a hashcode, which I am not able to map to the hash code generated from the original string. I used Mapper code to generate hashcode. Can you please let me know if elasticsearch pads or does more optimization to the hash code returned in aggregation?

Thanks!

photonic_world_2 · July 25, 2016, 9:01pm

Looks like the hash code returned as part of the aggregation has 0s padded to the right

Returned by aggregation: 7532129326328174000
Returned by murmur3: 7532129326328173534

Why and how is the the number rounded up?

nik9000 · July 25, 2016, 9:21pm

I dunno the answer to this one - I'm not super familiar with aggregations. Rounding a hash code is genuinely weird.

You'll probably have more luck if you just ask the question rather than ping someone directly.

photonic_world_2 · July 25, 2016, 9:24pm

Np.

Apologies. I just added you because I saw you as one of the committers of Mapper code.

Thanks again.

photonic_world_2 · July 26, 2016, 12:32am

This turns out is a bug with elasticsearch sense plugin, which rounded off the response hashcodes.

Topic		Replies	Views
Agg results can't support long words Elasticsearch	5	734	November 24, 2021
Terms Aggregation not returning keys Elasticsearch	16	1636	March 1, 2021
Best way to search for extremely long hash-code similarity? Elasticsearch	4	713	August 22, 2019
Hash matching queries Elasticsearch	5	2699	July 6, 2017
Aggregation question Elasticsearch	3	320	July 6, 2017

Terms aggregations on hashcodes (Murmur3FieldMapper)

Related topics