Match regular expressions in match or match_phrase queries

Hi,

I'm able to use "Match" queries to match single words and "match_phrase" to match exactly on a complete string. This is great and working for 80% of the matches I'm looking for with Watcher/ES.

The last 20% of the queries I need to watch for are regular expression based. For example:

LINK-5-CHANGED.*reset
LINK-5-CHANGED.*administratively

They're very similar but I'm unable to get any query to match just one message. Match_phrase won't even see it.

If i add this message to syslog via logger - "logger -p auth.notice LINK-5-CHANGED and about to be reset", this feeds into ES and when i run the watch, both my "match" queries for both of the expressions above match and send an email.

Is anyone able to help me to match on this regular expressions and how best to do it? I did see you could override the operator and change it to AND when using the "match" query but I don't think it works with the way I've written my code.

Here is a snippet of my code:

  },
  "input": {
    "search": {
      "request": {
        "indices": [ "logstash-*" ],
        "body": {
          "query": {
            "filtered": {
              "query": {
                "match": {"message": "LINK-5-CHANGED.*reset"}
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "range": {
                        "@timestamp": {
                          "gte": "now-16s"
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    }
  },

Thanks for any help you can give me.

Regards

Dennis

Hi Dennis,

You could try using the Regexp Query.

There are several ways to do this I think.
The way that is fastest at query time is to set a marker on the document at ingest tim if the message matches the regex. Do that in the indexing system. I'm sure logstash has things for that if you happen to be using it.

If you want to query historical data you'll have to figure out a query that will work for it. If the message field is:

Then something like

{ "match_phrase": {"message": "LINK-5-CHANGED reset", "slop": 4}}

should find it.

As to why regular expressions don't work, there are lots of reasons. Firstly, the match query doesn't support them. There is a regexp query that does but it works on the analyzed terms and not the source. The wikimedia-extra plugin has a source_regex filter that supports running against the source but it requires a reasonably deep understanding of analysis to set up and use efficiently. In the worst case its still devolves into brute force. Its really a weapon of last resort.

I'd probably go with the sloppy phrase queries if I'd indexed lots of data and fixing the tagging and reindexing if I hadn't or I was ok with only getting data produced after my change.