Using Regex in Kibana Query DSL (filter-option in discover mode)

Good day everyone,

I am relatively new to the use of Kibana. And I have the following problem:

I want to filter out all numbers and special characters like "_" or "-" in a field in Discover mode, so that I only have Letters.
For this I wanted to define a regex via "+Add filter" -> "Edit as Query DSL". Only the whole thing does not work as thought.
This text will be blurred
If I now the enter the Regex:

{

"query": {

"regexp": {

"user": {

"value": "[a-z]*"

}

}

}

}

Then there are still values reflected such as "sdasdas9_3". Can someone possibly help me to filter my search correctly?

Thank you! That would be great

I can't reproduce.

I created the following index and documents:


PUT delete_test_regex
{
  "mappings": {
    "properties": {
      "date": {"type": "date"},
      "check": {"type": "text"}
    }
  }
}

PUT delete_test_regex/_bulk
{ "index" : {}}
{ "date": "2022-04-01", "check": "abc"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "cde3"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "abc_"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "_abc"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "a_bc"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "abc"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "abc"}
{ "index" : {}}
{ "date": "2022-04-01", "check": "sdasdas9_3"}

Created a data view with the index and date field for the time range.

In discover created a filter with the following regexp query (copied from the docs):

{
  "query": {
    "regexp": {
      "check": {
        "case_insensitive": true,
        "flags": "ALL",
        "max_determinized_states": 10000,
        "rewrite": "constant_score",
        "value": "[a-z]*"
      }
    }
  }
}

And it works as expected:

Peek 2022-05-04 14-55

Is there anything in my example that rings a bell to you?

1 Like

Hi, thanks for the quick answer. Values like "sdasdas9_3" actually are filtered. That's nice. But there are randomly still numbers in the data. For example the value "xyz-xyz-xyz-001" and an email with "abc-avc.abc@bc.com" are still included. Probably because of the "-". Is there something to filter on that? Thanks a lot!

If you see the highlighted parts by discover you see that Elasticsearch matches the regexp against parts of the indexed document

You can run the query manually to see the same effect:

GET delete_test_regex/_search
{
  "query": {
    "regexp": {
      "check": {
        "case_insensitive": true,
        "flags": "ALL",
        "value": "[a-z]*"
      }
    }
  },
  "highlight": {
    "fields": {
      "check": {}
    }
  }
}

I'd suggest to post this question again in the Elasticsearch forum to reach out folks more acknowledged in how Lucene works with regexes.

Sorry for not being of much help :disappointed:

That was a pretty good help, thanks! Then I am gonna ask there, thanks!:slight_smile:

1 Like