Search using complex regex is not working on Kibana 6

(Ayman) #1


I'm trying to get the count the hits of an url pattern on Kibana, i have regex to catch urls in arabic and english.

i tried this request_url: /(en|ar|fr)/files/([\-_%A-Z0-9؀-ۿ=a-z\s].*)[0-9]+/$/ but it didn't work can you please advise?


(Spencer Alger) #2
  1. Assuming that is the exact text you are entering in the search box, make sure you remove the space between request_url: and the regular expression or it won't be limited to that field.

  2. You probably want to escape the / characters in your regex, replacing them with \/.

  3. I'm pretty sure your regular expression has to match the entire term, so you probably want to prefix your regex with .* to allow anything before the (en|ar|fr)

  4. Finally, make sure your field is not analyzed. When elasticsearch tests values against that regular expression it is going to use the indexed value which, by default, uses the standard analyzer and will break the string up into a bunch of tokens. To prevent that from happening you'll want to use the keyword type, which will index the entire value as a single token. This is done by default if you index JSON strings that don't have a mapping, in a sub-field with the suffix .keyword. Does your index have a request_url.keyword field? If not, can you reindex your data to create a keyword version of it?

(Ayman) #3

Hello Spencer,
Thanks for your reply.

Unfortunately, i tried all your recommendations and nothing worked with me.

Yes i do have the .keyword field for all strings.
Examples of regex i tried"

request_url.keyword:/.*(en|ar|fr)\/files\/([\-_%A-Z0-9؀-ۿ=a-z\s].*)[0-9]+\/$/ request_url.keyword:/(en|ar|fr)\/files\/([\-_%A-Z0-9؀-ۿ=a-z\s].*)[0-9]+\/$

And many others.
If that is not possible through Kibana, is it possible to run this query on elasticsearch directly?


(Spencer Alger) #4

Hmm, it might be arabic characters that are breaking the regex, sure wish is had more verbose error options. You can try executing the request directly to elasticsearch with the Dev Tools > Console app. Try something like:

POST /index/_search
  "query": {
    "regexp": {
      "request_url.keyword": ".*(en|ar|fr)\/files\/([\-_%A-Z0-9؀-ۿ=a-z\s].*)[0-9]+\/$"

Notice how the JSON syntax highlighter failed to find the " because of some of the characters in that regexp... Maybe you should try a negative character class rather than trying to list all the characters you want to support? (heading "Negated Character Classes")

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.