avnere
(Avner)
March 15, 2022, 2:46pm
1
Hi,
I am looking for precise count results for index with over 70 millions documents.
Equal to this: SQL : select count(distinct column) from table;
I already tried Terms-aggregation, composite-aggregation, cardinality-aggregation. But didn't get exact count.
What is the best approach for it?
dadoonet
(David Pilato)
March 15, 2022, 3:03pm
2
If you want to count the number of documents, try:
GET /your-index/_search?track_total_hits=true&size=0
avnere
(Avner)
March 15, 2022, 3:42pm
3
I want to count distinct value of field.
something like: select count(distinct name) from table;
dadoonet
(David Pilato)
March 15, 2022, 3:55pm
4
So terms aggregation is the way to go.
avnere
(Avner)
March 15, 2022, 4:46pm
5
The index with over 70 millions documents. I already tried Terms-aggregation, composite-aggregation, cardinality-aggregation. But didn't get exact count.
dadoonet
(David Pilato)
March 15, 2022, 6:03pm
6
Could you share the request and the response please? With a terms
agg.
avnere
(Avner)
March 15, 2022, 6:28pm
7
GET abc/_search
{
"aggs":{
"unique_fields": {
"terms": {
"field": "source",
"size": 10000,
"order": {
"_key": "asc"
}
}
}
},
"size": 0
}
In order to compare I ran count on each "source" field and saw that the values are not equal.
dadoonet
(David Pilato)
March 15, 2022, 7:11pm
8
Could you share the output?
I'd like to see the first part (with some first terms)
avnere
(Avner)
March 15, 2022, 8:31pm
9
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_fields" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "FIFO",
"doc_count" : 13
},
{
"key" : "FW",
"doc_count" : 77757759
},
{
"key" : "HW",
"doc_count" : 12394396
},
{
"key" : "Info",
"doc_count" : 6705
}
]
}
}
}
dadoonet
(David Pilato)
March 15, 2022, 10:48pm
10
The numbers you are seeing are the exact numbers. What makes you think that they are not?
casterQ
(caster)
March 16, 2022, 2:30am
11
terms agg, you can use shard_size
cardinality agg, you can use precision_threshold
avnere
(Avner)
March 16, 2022, 7:48am
12
When running count on specific field I get different number.
For Example:
When running the following I get count 6561, while with terms aggs I got 6705.
GET baseelements_rwr/_count
{
"query": {
"bool": {
"must": [
{"match": {"source": "Info"}}
]
}
}
}
Cardinality aggregation, like some other aggregation types, are by design approximations. This is generally highligted in the documentation. I recommend reading the explanation and discussion provided in this thread as well as this one .
dadoonet
(David Pilato)
March 16, 2022, 9:48am
14
Could you try with:
GET baseelements_rwr/_count
{
"query": {
"bool": {
"must": [
{"term": {"source": "Info"}}
]
}
}
}
What is the mapping for this source
field?
What does a typical document look like?
avnere
(Avner)
March 16, 2022, 10:34am
15
source field is keyword.
Same results when switching to term instead match.
dadoonet
(David Pilato)
March 16, 2022, 11:09am
16
Could you share a typical document please?
system
(system)
Closed
April 13, 2022, 11:09am
17
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.