Can I group by a field and ignore the buckets?

I am using elastic 1.5.

Here are my mappings:

config = {
    "mappings": {
        my_doc_type: {
            "dynamic": False,
            "properties": {
                "timestamp_start": {
                    "type": "date"
                },
                "timestamp_end": {
                    "type": "date"
                },

                "entity_id": {
                    "type": "string",
                    "index": "not_analyzed"
                },

                "version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
            }
        },
    ...
    }
}

I am trying to perform a query like this:

{
  "aggregations": {
    "by_version": {
      "aggregations": {
        "by_entity_id": {
          "terms": {
            "field": "entity_id"
          }
        }
      },
      "terms": {
        "field": "version"
      }
    }
  },
  "from": 0,
  "size": 0
}

Here is what I get:

{
  "aggregations": {
    "by_version": {
      "buckets": [
        {
          "by_entity_id": {
            "buckets": [
              {
                "doc_count": 480,
                "key": "4bcf"
              },
              {
                "doc_count": 480,
                "key": "60965392"
              },
              {
                "doc_count": 480,
                "key": "73ba"
              },
              {
                "doc_count": 480,
                "key": "bb1f"
              },
              {
                "doc_count": 480,
                "key": "ff0cf25f5480"
              }
            ],
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0
          },
          "doc_count": 480,
          "key": "2.2.0"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "items": [],
  "total": 480
}

What I am going for is:

  • group my documents by entity_id (I don't care how many there are, a bucket size of 1 is all I need)
  • group these entity_id buckets into buckets based on the version field (the number of documents in this bucket would be equal to 1 document per entity_id bucket)

I don't really need the buckets for my by_entity_id terms aggregation - there will be tens of thousands. I'm using a term aggregation here because I don't know a better way - I just want to know the number of unique field values for 'entity_id' for each version.

I feel like I'm abusing terms aggregations or that there's a much better way to do this. Could I use a cardinality aggregation somehow? The Sum and Value Count aggregations don't really seem helpful here.

Thanks for any help you can provide!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.