Get all distincts values of a field ( more than 10k values)

Hi there,
Perhaps this question was asked many times but I haven't seen an answer that fits me. Im looking for the best way to get all the distincts values of a field in a group of indices and I manage to create a script in order to do this but It doesn't feel right to me so Im asking help form the experts.

partitions=25
rm run*
for (( i=0;i<$partitions;i++))
do
  curl -s -u user:password 'http://10.x.x.x:9200/index-*'/_search?pretty -H 'Content-Type: application/json' -d"
  {
     \"size\": 0,
     \"aggs\": {
        \"expired_sessions\": {
           \"terms\": {
              \"field\": \"data.device.deviceid\",
              \"include\": {
                 \"partition\": $i,
                 \"num_partitions\": $partitions
              },
              \"size\": 10000
           }
        }
     }
  }
  " > run.$i
done
cat run*|jq .aggregations.expired_sessions.buckets[].key

In fact the number of devices is differs when I run a cardinality query

GET index-*/_search
{
  "size": 0,
  "aggs": {
    "count": {
      "cardinality": {
        "field": "data.device.deviceid"
      }
    }
  }
}

The cardinality query retuns 62102 vs the term agg that returns 61990

A couple of points:

  1. the cardinality aggs is, by design, approximate - see the docs
  2. if you don’t need to sort the terms by a child agg the ‘composite’ agg is probably simpler than using the ‘terms’ aggregation with partitioning

Colud you please give me an example of composite agg to do this, I can't figure It out

The after param is what allows you to page the composite agg.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.