Creating a transform with must_not bool not working as expected

Hi everyone,

I am trying to create a transform and facing some issues with using the must_not for a specific condition I have. Here are the details:

Source index sample document:

{
  "app.status": "COMPLETED",
   "@timestamp": "2021-03-15T00:01:01.013Z"
}

Possible values for app.status are COMPLETED, TERMINATED, SUSPENDED or RETRY

I want to apply a transform that pulls only the documents that have app.status other than COMPLETED or TERMINATED.

Here is the full bool I tried using

{
    "must_not": [
        {"term": {"app.status": "COMPLETED" }},
        {"term": {"app.status": "TERMINATED" }}
    ]
}    

But I am not getting the right result. I verified this by checking the number of records that match the condition from the discover screen.

When I use a must as shown below I am able to accomplish the same. But there is a possibility that app.status can have other different values in future. So I will not need to update my transform in the future if I can accomplish this with must_not instead of should.

"bool": {
            "should": [
              {
                "term": {
                  "app.status": "RETRY"
                }
              },
              {
                "term": {
                  "app.status": "SUSPENDED"
                }
              }
            ],
            "minimum_should_match": 1
          }

Can someone please help me out here.

The query looks good. Just to clarify, are you providing this query in transform.source.query? Or in some other place?
Could you send the full transform config?

I tried to reproduce your issue locally, using the following transform config:

{
  "source": {
    "index": [
      "my-index"
    ],
    "query": {
      "bool": {
        "must_not": [
          {"term": {"app.status": "COMPLETED"}},
          {"term": {"app.status": "TERMINATED"}}
        ]
      }
    }
  },
  "pivot": {
    "group_by": {
      "app.status": {"terms": { "field": "app.status.keyword" } }
    },
    "aggregations": {
      "max_timestamp": { "max": { "field":  "@timestamp" } }
    }
  }
}

and the results were as expected.

Could you clarify what exact numbers were you comparing?

+1 That's definitely a good approach to use exclusions rather than inclusions in this case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.