Can I group by a field and ignore the buckets?


(Jong) #1

I am using elastic 1.5.

Here are my mappings:

config = {
    "mappings": {
        my_doc_type: {
            "dynamic": False,
            "properties": {
                "timestamp_start": {
                    "type": "date"
                },
                "timestamp_end": {
                    "type": "date"
                },

                "entity_id": {
                    "type": "string",
                    "index": "not_analyzed"
                },

                "version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
            }
        },
    ...
    }
}

I am trying to perform a query like this:

{
  "aggregations": {
    "by_version": {
      "aggregations": {
        "by_entity_id": {
          "terms": {
            "field": "entity_id"
          }
        }
      },
      "terms": {
        "field": "version"
      }
    }
  },
  "from": 0,
  "size": 0
}

Here is what I get:

{
  "aggregations": {
    "by_version": {
      "buckets": [
        {
          "by_entity_id": {
            "buckets": [
              {
                "doc_count": 480,
                "key": "4bcf"
              },
              {
                "doc_count": 480,
                "key": "60965392"
              },
              {
                "doc_count": 480,
                "key": "73ba"
              },
              {
                "doc_count": 480,
                "key": "bb1f"
              },
              {
                "doc_count": 480,
                "key": "ff0cf25f5480"
              }
            ],
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0
          },
          "doc_count": 480,
          "key": "2.2.0"
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "items": [],
  "total": 480
}

What I am going for is:

  • group my documents by entity_id (I don't care how many there are, a bucket size of 1 is all I need)
  • group these entity_id buckets into buckets based on the version field (the number of documents in this bucket would be equal to 1 document per entity_id bucket)

I don't really need the buckets for my by_entity_id terms aggregation - there will be tens of thousands. I'm using a term aggregation here because I don't know a better way - I just want to know the number of unique field values for 'entity_id' for each version.

I feel like I'm abusing terms aggregations or that there's a much better way to do this. Could I use a cardinality aggregation somehow? The Sum and Value Count aggregations don't really seem helpful here.

Thanks for any help you can provide!


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.