Aggregation over an aggregation

Hello,

I have some entries with the following form:

{
  "categoryId": "categoryName",
  "elementId": "elementId",
  "state": "STATE"
}

The state can either be "INIT", "RUN", "DONE", ''ERROR" (in this order for the first three)

I'd like to know for each category at which state is each element. For exemple, for the following (categoryId, elementId, state) tuples:

("category1", "element1", "INIT")
("category1", "element1", "RUN")
("category1", "element2", "INIT")
("category2", "element1", "INIT")
("category2", "element2", "RUN")

I'd like to get the following (categoryId, state, count) tuples:

("category1", "INIT", 1)
("category1", "RUN", 1)
("category2", "RUN", 1)

For now, I found how to get the last state for each of my element (for exemple, an element inside a category that have two entries INIT and RUN need to be recognized as RUN only), but I don't know how to aggregate these results afterwards. This is my request:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "state": {"value": "INIT", "boost": 10}
          }
        },
        {
          "term": {
            "state": {"value": "RUN", "boost": 20}
          }
        },
        {
          "term": {
            "state": {"value": "DONE", "boost": 30}
          }
        },
        {
          "term": {
            "state": {"value": "ERROR", "boost": 40}
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "states": {
      "terms": {"field": "categoryId"},
      "aggs": {
        "elements": {
          "terms": {"field": "elementId"},
          "aggs": {
            "state": {
              "top_hits": {"size": 1}
            }
          }
        }
      }
    }
  }
}

I've tried to play with a pipeline aggregation, but I'm not sure how to do that. Is there any solution to aggregate these states now to have a category-wide number of elements by status?

I'm not sure if my problem is understandable, don't hesitate to ask me more if that's the case!

Thanks,

Elasticsearch doesn't support aggregations that require the handling of a document to depend on the value on another document in general. So I'm afraid there is no way to do it in Elasticsearch directly. You might need to do some client-side logic to first figure out the current state for each category, and then another request to aggregate documents for each of these (category,state) pairs.

This is what I feared, but this doesn't seem possible on the client side either because of the amount of entries involved :-/

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.