Match any character regular expression does not match as expected

POST /search_test/_doc
{
  "text":"Jack and Jill went up the hill."
}
GET /search_test/_search?q=j%2A
GET /search_test/_search?q=j...

I would expect those two queries to return the same document since both of those regular expressions should match on "Jack" and "Jill", but they do not. Searching for the regular expression j... returns nothing. As stated here, I would expect the periods to match on any character.

Note that j%2A decodes to j*

I've discovered via the docs that:

Query parameter searches do not support the full Elasticsearch Query DSL but are handy for testing.

Thinking this may be my problem, I changed the request to:

GET /search_test/_search
{
  "query": {
    "query_string": {
      "query": "J..."
    }
  }
}

but get the same results. I'm using a query_string because I must support a generic search engine style text box.

The query in your query_string query differs from what you specified above in the query parameters from the URL? Can you clarify?

Thanks!

Oh good catch. Lowercase "j" gives the same 0 results:

GET /search_test/_search
{
  "query": {
    "query_string": {
      "query": "j..."
    }
  }
}

Edit: j* and J* work as expected.

Ok, that makes more sense.

To explain, your queries mean two very different things. The first one searches for j in the inverted index (think of it as an index in a book, you jump to the letter J, but it only contains Jack and Jill, but nothing for J).

The second one J* searches for everything starting with J and thus Jack and Jill are hits.

Hope that makes sense.

I'm not sure I follow - I'm expecting the periods in the query string to act as regular expression dots, i.e. match on any character (docs). I am unable to execute any regular expressions using a query_string, which is supposed to be supported:

Elasticsearch supports regular expressions in the following queries:

Is there something I'm missing here? Why can I not execute regular expressions in a query_string?

It turns out you need to surround the regular expression in forward slashes when using them in a query_string:

Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/")

Source

This uncovered another problem, which is that there appears to be no case_insensitive flag as there is with regexp.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.