avnere  
                (Avner)
               
                 
              
                  
                    March 15, 2022,  2:46pm
                   
                   
              1 
               
             
            
              Hi,
I am looking for precise count results for index with over 70 millions documents. 
Equal to this: SQL : select count(distinct column) from table;
I already tried Terms-aggregation, composite-aggregation, cardinality-aggregation. But didn't get exact count.
What is the best approach for it?
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 15, 2022,  3:03pm
                   
                   
              2 
               
             
            
              If you want to count the number of documents, try:
GET /your-index/_search?track_total_hits=true&size=0
 
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 15, 2022,  3:42pm
                   
                   
              3 
               
             
            
              I want to count distinct value of field. 
something like: select count(distinct name) from table;
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 15, 2022,  3:55pm
                   
                   
              4 
               
             
            
              So terms aggregation  is the way to go.
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 15, 2022,  4:46pm
                   
                   
              5 
               
             
            
              The index with over 70 millions documents. I already tried Terms-aggregation, composite-aggregation, cardinality-aggregation. But didn't get exact count.
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 15, 2022,  6:03pm
                   
                   
              6 
               
             
            
              Could you share the request and the response please? With a terms agg.
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 15, 2022,  6:28pm
                   
                   
              7 
               
             
            
              GET abc/_search
{
"aggs":{
        "unique_fields": {
            "terms": {
                "field": "source",
                "size": 10000,
                "order": {
                  "_key": "asc"
                }
            }
        }
    },
    "size": 0
}
 
In order to compare I ran count on each "source" field and saw that the values are not equal.
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 15, 2022,  7:11pm
                   
                   
              8 
               
             
            
              Could you share the output? 
I'd like to see the first part (with some first terms)
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 15, 2022,  8:31pm
                   
                   
              9 
               
             
            
              {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "unique_fields" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "FIFO",
          "doc_count" : 13
        },
        {
          "key" : "FW",
          "doc_count" : 77757759
        },
        {
          "key" : "HW",
          "doc_count" : 12394396
        },
        {
          "key" : "Info",
          "doc_count" : 6705
        }
      ]
    }
  }
}
 
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 15, 2022, 10:48pm
                   
                   
              10 
               
             
            
              The numbers you are seeing are the exact numbers. What makes you think that they are not?
             
            
               
               
               
            
            
           
          
            
              
                casterQ  
                (caster)
               
                 
              
                  
                    March 16, 2022,  2:30am
                   
                   
              11 
               
             
            
              terms agg,  you can use shard_size
cardinality agg, you can use precision_threshold 
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 16, 2022,  7:48am
                   
                   
              12 
               
             
            
              When running count on specific field I get different number.
For Example: 
When running the following I get count 6561, while with terms aggs I got 6705.
GET baseelements_rwr/_count
{
  "query": {
    "bool": {
        "must": [
            {"match": {"source": "Info"}}
        ]
    }
  }
}
 
             
            
               
               
               
            
            
           
          
            
            
              Cardinality aggregation, like some other aggregation types, are by design approximations. This is generally highligted in the documentation. I recommend reading the explanation and discussion provided in this thread  as well as this one .
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 16, 2022,  9:48am
                   
                   
              14 
               
             
            
              Could you try with:
GET baseelements_rwr/_count
{
  "query": {
    "bool": {
        "must": [
            {"term": {"source": "Info"}}
        ]
    }
  }
}
 
What is the mapping for this source field? 
What does a typical document look like?
             
            
               
               
               
            
            
           
          
            
              
                avnere  
                (Avner)
               
              
                  
                    March 16, 2022, 10:34am
                   
                   
              15 
               
             
            
              source field is keyword.
Same results when switching to term instead match.
             
            
               
               
               
            
            
           
          
            
              
                dadoonet  
                (David Pilato)
               
              
                  
                    March 16, 2022, 11:09am
                   
                   
              16 
               
             
            
              Could you share a typical document please?
             
            
               
               
               
            
            
           
          
            
              
                system  
                (system)
                  Closed 
               
              
                  
                    April 13, 2022, 11:09am
                   
                   
              17 
               
             
            
              This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.