Hello,
I have an index (4 shards and 1 replica each) which has high ingestion rate (5k per s) distributed among 5 data nodes.
The problem I see is that when a terms aggregation is run on this index, the response time is very high.
Mapping:
"recipients": { "type": "string", "index": "not_analyzed", "fields": { "hash": { "type": "murmur3" } } },
Aggregation query
"aggregations": { "bucket_agg": { "terms": { "field": "recipients.hash", "size": 5, "shard_size": 0, } } }
But aggregating on hash code is way faster, which is expected as it hashcode fields are not strings and do not need global ordinals to be updated in field data on heavily indexed index.
My problem is that if I query on hash code the value returned by the aggregate is a hashcode, which I am not able to map to the hash code generated from the original string. I used Mapper code to generate hashcode. Can you please let me know if elasticsearch pads or does more optimization to the hash code returned in aggregation?
Thanks!