Documents appearing a bool query where one must clause does not match

I may not really understand the scoring issue but here is my problem. I have a bool query. I am looking particular devices (a_device_hostname) with the other characteristics in the must clause.

{
  "size": 10,
  "_source": "a_device_hostname",
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tags": "smarts"
          }
        },
        {
          "match": {
            "a_device_hostname" : "WL-MIELI-ADC-B14-WISM8"
          }
        },
        {
          "match_phrase": {
            "message": "Down"
          }
        },
        {
          "query_string": {
            "default_field": "event_from",
            "query": "event_from:/.*PR.*/"
          }
        }
      ],
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-7d",
            "lte": "now"
          }
        }
      }
    }
  },
  "aggs": {
    "cats": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h"
      }
    }
  }
}

But I'm getting some other devices in the return

{
  "took": 605,
  "timed_out": false,
  "_shards": {
    "total": 698,
    "successful": 698,
    "skipped": 635,
    "failed": 0
  },
  "hits": {
    "total": 93,
    "max_score": 27.254362,
    "hits": [
      {
        "_index": "igemsbigdata-unicredit-2020.12",
        "_type": "doc",
        "_id": "ql_77XABbSnlRFWQ6mOB",
        "_score": 27.254362,
        "_source": {
          "a_device_hostname": "SW-MIELI-ADC-B14-WIFI-1"
        }
      },
      {
        "_index": "igemsbigdata-unicredit-2020.12",
        "_type": "doc",
        "_id": "sF_77XABbSnlRFWQ6mOB",
        "_score": 26.743238,
        "_source": {
          "a_device_hostname": "SW-MIELI-ADC-B14-WIFI-1"
        }
      },
      {
        "_index": "igemsbigdata-unicredit-2020.12",
        "_type": "doc",
        "_id": "M1QF43ABbSnlRFWQynoG",
        "_score": 24.763958,
        "_source": {
          "a_device_hostname": "RT-MIELI-ADC-B28-VOIPNEW"
        }
      },
      {
        "_index": "igemsbigdata-unicredit-2020.12",
        "_type": "doc",
        "_id": "O0cB43ABbSnlRFWQOTci",
        "_score": 24.68473,
        "_source": {
          "a_device_hostname": "RT-MIELI-ADC-B28-VOIPNEW"
        }
      },

Is there something I'm not understanding about the must clause?

Thanks in advance
Norm

The "must" clause is a scoring mechanism -- it will prioritize items that match the query. If you want to filter out results, use the "filter" clause.

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-bool-query.html

I added an additional filter for the device, however, it seems to have made it worse.

{
  "size": 10,
  "_source": "a_device_hostname",
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tags": "smarts"
          }
        },
        {
          "match_phrase": {
            "message": "Down"
          }
        },
        {
          "query_string": {
            "default_field": "event_from",
            "query": "event_from:/.*PR.*/"
          }
        }
      ],
      "filter": [
        {
          "match": {
            "a_device_hostname": "WL-MIELI-ADC-B14-WISM8"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-7d",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "cats": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h"
      }
    }
  }
}

Results:

{
  "took": 301,
  "timed_out": false,
  "_shards": {
    "total": 698,
    "successful": 698,
    "skipped": 635,
    "failed": 0
  },
  "hits": {
    "total": 93,
    "max_score": 17.516088,
    "hits": [
      {
        "_index": "igemsbigdata-unicredit-2020.11",
        "_type": "doc",
        "_id": "x9NV23ABbSnlRFWQeHnV",
        "_score": 17.516088,
        "_source": {
          "a_device_hostname": "fw-mieli-a211-06-dmz"
        }
      },
      {
        "_index": "igemsbigdata-unicredit-2020.11",
        "_type": "doc",
        "_id": "pQ9s23ABbSnlRFWQTiM3",
        "_score": 17.512497,
        "_source": {
          "a_device_hostname": "fw-mieli-a211-06-dmz"
        }
      },
      {
        "_index": "igemsbigdata-unicredit-2020.11",
        "_type": "doc",
        "_id": "kX-V23ABbSnlRFWQh89F",
        "_score": 17.50359,
        "_source": {
          "a_device_hostname": "fw-mieli-a217-16-dmz"
        }
      },

How about if you try term instead of match, in your filter?

What is the mapping of that field?

It's a string searchable.

If it is not mapped as keyword it will be tokenized and any part that match will count. If you want the full string to match it must be mapped as keyword.

Sorry. I should have included that. It is keyworded.

In a regular query term will not find the device, match and match_phrase will find it. However, when I use my bool query and put either, match or match_phrase in the filter, I get no results.

{
  "size": 10,
  "_source": "a_device_hostname",
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tags": "smarts"
          }
        },
        {
          "match_phrase": {
            "message": "Down"
          }
        },
        {
          "query_string": {
            "default_field": "event_from",
            "query": "event_from:/.*PR.*/"
          }
        }
      ],
      "filter": [
        {
          "match": {
            "a_device_hostname": "WL-MIELI-ADC-B14-WISM8"
          }
        },
        {
          "range": {
            "int_timestamp": {
              "gte": "now-7d",
              "lte": "now"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "cats": {
      "date_histogram": {
        "field": "int_timestamp",
        "interval": "1h"
      }
    }
  }
}

{
  "took": 972,
  "timed_out": false,
  "_shards": {
    "total": 735,
    "successful": 735,
    "skipped": 675,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "cats": {
      "buckets": []
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.