Inconsistencies in value count aggregation

Atefeh · March 13, 2024, 1:39pm

I am using the Value count aggregation on a fixed interval, (the shards data doesn't change within this interval). However, I am encountering varying results with each query execution. Is it possible that I am obtaining an approximate count instead of exact counts?

p.s. My data is distributed across multiple nodes and shards.

{
	"size": 1,
	"timeout": "90s",
	"query": {
		"range": {
			"myfield": {
				"from": 1709584200000,
				"to": 1710133200000,
				"include_lower": true,
				"include_upper": true,
				"boost": 1.0
			}
		}
	},
	"aggregations": {
		"cnt": {
			"value_count": {
				"field": "myfield"
			}
		}
	}
}

Sean_Story · March 13, 2024, 1:50pm

I expect that this is an approximate count. That's called out for a terms aggregation, see: Terms Agg doc count error. I'm surprised it's not called out for value_count in the docs too. I'll go digging and see if this is a gap in our docs.

Sean_Story · March 13, 2024, 2:53pm

@Atefeh after talking with the team who maintains the value_count agg, it seems my initial assumption was wrong, and it should be an exactly count - NOT approximate. It may be that you've found a bug, or that your cluster is having issues.

Can you share some more info?

what version of Elasticsearch are you using?
can you reproduce on another index? If so, can you provide the reproduction steps?
can you share the full output of your responses that differ, and show that there are not any partial shard failures?
can you check the Elasticsearch logs during the query and share if there are any errors or warnings?

Atefeh · March 16, 2024, 8:10am

Thank you for your attention. I am using Elasticsearch v 7.17.9.
You are right. There are some shard failures in the query response.

{
	"type": "circuit_breaking_exception",
	"reason": "[parent] Data too large, data for [indices:data/read/search[phase/query]] would be [35868302350/33.4gb], which is larger than the limit of [35701915648/33.2gb], real usage: [35868301824/33.4gb], new bytes reserved: [526/526b], usages [request=0/0b, fielddata=20312178761/18.9gb, in_flight_requests=1052/1kb, model_inference=0/0b, eql_sequence=0/0b, accounting=469451628/447.7mb]",
	"bytes_wanted": 35868302350,
	"bytes_limit": 35701915648,
	"durability": "PERMANENT"
}

I also observed a GC message in the node where the shard failed. (GC did not bring memory usage down, before [35712048256], after [35712106128])

system · April 13, 2024, 8:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation Module - value_count problem Elasticsearch	2	412	July 6, 2017
Incorrect Aggregations returned from ES Elasticsearch	3	1002	July 6, 2017
Aggregation module - value_count clarification/problem Elasticsearch	2	314	July 6, 2017
Newbie question: Elastic Search as NoSql analytics tool, help with concept of accuracy (for more than one shard) Elasticsearch	1	349	July 6, 2017
When aggregated by terms the value is incorrect Kibana	5	1376	November 7, 2018

Inconsistencies in value count aggregation

Related topics