Regexp Query - How to detect the beginning and the finish of a string?

Hello,
The documentation says :
If you want the regexp pattern to start at the beginning of the string or finish at the end of the string, then you have to anchor it specifically, using ^ to indicate the beginning or $ to indicate the end.
But if we put for example this document

PUT test50/_doc/1 
{
  "request": "test"
}

and then lauch a query with this regexp :

GET test50/_search
{
  "query": {
    "regexp": {
      "request.keyword": "(.* |^)test( .*|$)"
    }
  }
}

the object is to find the word in the beggining, middle, or the finish of the string.
This request dosen't match the document.

Note that you don't need to use both the code format icon and the citation icon. Code format is perfect.

I can't answer on regex as I have hard time to write regex myself.
But are you aware that this is a very very slow method to find a substring in a text?

If this is something you intend to run on a small dataset or only once, then that could be ok, otherwise I'd suggest to use a ngram analyzer at index time instead and then use a Term query.

Thank you for your reply.
The code format dosen't work. it only worked when activating quote & code format.
Well, the dataset is large, and I have a big set of regexes that I should apply to the dataset.
But my problem is, how can we detect the beginning and the finish of a string normally, like a perl regex with ^ and $ .

It's because you did not add a blank line after:

But if we put for example this document

And before

and then lauch a query with this regexp

Back to the questions:

Well, the dataset is large

So you will run into performance troubles. But if this is ok for you, then do it.

But my problem is, how can we detect the beginning and the finish of a string normally, like a perl regex with ^ and $ .

As I said, I don't know as I'm not a regex expert. Someone else might help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.