I want to get the exact distinct count and the docs in query

I am new to ES,i have a requirement where i need to get the distinct values and distinct exact count of records till 60000 from elastic search query but my query is always returning the approximate counts only.Can some one help?

Hi @nandalapadu_shamriya, Welcome to the community.
Could you please elaborate on this some more. Could you tell us what is the dsl query you're using for doing the count?

POST /index/_doc_search
{
"from": 1,
"size": 1000,
"timeout": "60s",
"query": {
"bool": {
"filter": [
{
"term": {
"entity_type_code": {
"value": 5,
"boost": 1
}
}
},
{
"term": {
"country_id": {
"value": 1,
"boost": 1
}
}
},
{
"term": {
"period_type_code": {
"value": 100,
"boost": 1
}
}
},
{
"terms": {
"platform_type_code": [
0
],
"boost": 1
}
},
{
"bool": {
"should": [
{
"term": {
"data_date": {
"value": "2020-04-13",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"brand_id",
"brand_name",
"sub_brand_id",
"sub_brand_name",
"asset_id",
"asset_name",
"asset_segment_id",
"asset_segment_value",
"category_id",
"category_name",
"subcategory_id",
"subcategory_name",
"entity_type_code",
"entity_type_desc"
],
"excludes":
},
"sort": [
{
"asset_segment_value.keyword": {
"order": "asc"
}
},
{
"asset_segment_value.keyword": {
"order": "asc"
}
}
],
"aggregations": {
"asset_segment_id": {
"cardinality": {
"field": "asset_segment_id"
}
}
},
"collapse": {
"field": "asset_segment_id"
}
}

i used cardinality to get the distinct count on a field asset_segment_id, but i am not getting the exact count.The number of docs returned by query is not match with count calculated by cardinality.

Cardinality aggregations in Elasticsearch are always an approximation. Have a look at this thread for discussion on the topic.

Any other way to get distinct exact count apart from cardinality.?

i read through the elastic documents it was said that up to the precision threshold 40000 the count are accurate. but i still see 2-3 differences between them.please help

Precision threshold is not the boundary between accurate and inaccurate. It is the threshold between one fuzzy counting technique and another. Below the precision threshold it is a count of hashes (that can have a collision but is relatively rare).

1 Like

Thank you for your reply, but I have tried different ways with out collapse field also but all the same returns the result. Without precision the gap is high with precision max 10 difference.can it be depends on the how shards were created and ingestion of data.?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.