Finding number of files in latest batch using aggregations

Hello,

Here is the scenario:

I have an index that contains documents representing files uploaded to a file store. These documents are uploaded in batches and each doc is tagged with a batch_id field as well as the time they are uploaded. They also have a status field that tracks if the file pass/failed during the ingestion.

Some example docs:

{
          "batch_id" : "a",
          "file_name" : "file_1_from_batch_a",
          "@timestamp" : "2021-10-12T18:12:54.331Z",
          "status" : "success"
 }
{
          "batch_id" : "a",
          "file_name" : "file_2_from_batch_a",
          "@timestamp" : "2021-10-12T00:00:00.000Z",
          "status" : "success"
}
{
          "batch_id" : "b",
          "file_name" : "file_1_from_batch_b",
          "@timestamp" : "2021-10-13T18:13:00.000Z",
          "status" : "failure"
}
{
          "batch_id" : "b",
          "file_name" : "file_2_from_batch_b",
          "@timestamp" : "2021-10-13T18:10:22.450Z",
          "status" : "failure"
}

I wish to perform an aggregation query over the index to find out how many failures have occurred in the latest batch of files.

Here's what I've come up with so far, but sadly its not giving the right answer

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "most_recent" : {
      "terms": {
        "field" : "@timestamp",
        "order": { "_term": "desc" },
        "size": 1
      },
      "aggs": {
        "execution_id": {
          "terms": {
            "field": "batch_id.keyword"
          },
          "aggs": {
            "failures": {
              "filter": {"term": {"status.keyword": "failure"}}
            }
          }
        }
      }
    }
  }
}

Response:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "most_recent" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3,
      "buckets" : [
        {
          "key" : 1634148780000,
          "key_as_string" : "2021-10-13T18:13:00.000Z",
          "doc_count" : 1,
          "execution_id" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "b",
                "doc_count" : 1,
                "failures" : {
                  "doc_count" : 1
                }
              }
            ]
          }
        }
      ]
    }
  }
}

The query is giving me the batch_id of the most recent batch (which is good), but is incorrectly telling me how many files in that batch failed (it should be 2).

I would appreciate any help on this!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.