How to build a terms agg with a high cardinate filed and a low cardinate field

Hello,
there are two fields in my docs, cid and os. Cid is a high cardinate field, while os
either android or ios.
I want to know the difference between the following aggs, which is better.

GET _search
{
  "size": 0, 
  "aggs":{
      "cid":{
        "terms": {
          "field": "cid",
          "size": 1000
        },
        "aggs": {
          "os": {
            "terms": {
              "field": "os"
            }
          }
        }
      }
    }
}

GET _search
{
  "size": 0, 
  "aggs":{
      "os":{
        "terms": {
          "field": "os"
        },
        "aggs": {
          "cid": {
            "terms": {
              "field": "cid",
              "size": 1000
            }
          }
        }
      }
    }
}

There is no notion of better, as both return different data

The first one returns the first 1000 'cids' and for each of those, you will get a count that returns the operating system, either android or ios. This means you will see a thousand different cids.

The second query however can return more than 1000 different cids, as it will return the first 1000 cids for android and the first 1000 cids for IOS, which might be completely different cids, up to 2000 unique ones.

Hope that answers the questions.

1 Like

Thanks,
if the unique value of cids is under 1000, the results is same, in this situation, which is better?

If a cid typically only has one os then the os->cid form will use less bytes than the cid->os to convey the same information.
Either way, watch out for non-zero values in doc_count_error_upper_bound in results. If this happens consider increasing shard_size setting to trade RAM for accuracy.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.