Regexp query for url matching

Hi there!

I’ve trouble using the regexp query of elasticsearch.

I want to identify records looking likes this:
"parsed_message": "GET https://server/different-endpoints/fix-endpoint/library.js?m=15f2fe2ddf0, Referrer=https://url.com"

I’ve read https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html and learned that elasticsearch regex matching works a bit different. But according to what I read I played around and found out that

GET logs-*/_search
{
  "query": {
      "regexp": {
        "parsed_message": ".*\\.js.*"
      }
  }
}

actually finds results. Since this expression is way to rough, I tried

GET logs-*/_search
{
  "query": {
      "regexp": {
        "parsed_message": ".*\\.js\\?.*"
      }
  }
}

to make it a bit more precise. But even after this little change the query does not match anymore. What am I doing wrong?

Best regards,
BK

No one here who has some experience with that?

using a regular expression is not an efficient way to solve this. What if you decided on indexing, if there is an URL in your field and extracted that URL or its parts out of that? This way your query could become much simpler, maybe just for the existence of a field or a query against the deconstructed parts of an URL.

This however requires some preprocessing work (a classic search tradeoff, you need to decide if you spent CPU cycles on indexing and preprocessing or on search time).

You could use an self written ingest processor to extract URLs from text or just have a preprocessing python script.

This blog post might be of interest: https://www.elastic.co/blog/writing-your-own-ingest-processor-for-elasticsearch

Thank you for you tips, I implemented it via grok and tagging for now. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.