Regexp support for matching at the beginning (^) and the end ($)


(Jin Wu) #1

I tried to use a regular expression ^10AB[0-9]{8}$ to match string that start with "10AB" and end with 8 digits, but seems Elasticsearch does not support "^" and "$" to match at the beginning and end. If I do not use "^" and "$" in the regular expression, I will get more than I want, for instance, 10AB00000001, 10AB00000001:count, etc. Note, 10AB00000001::count is not what I want.

So I just want to verify that does Elasticsearch support "^" and "$" to match at the beginning and end; if not, is there any other way to achieve the same purpose? Thanks.


(Mark Walkom) #2

You either use simple lucene syntax - https://lucene.apache.org/core/2_9_4/queryparsersyntax.html - or the larger ES DSL - https://www.elastic.co/guide/en/elasticsearch/reference/current//query-dsl.html

I just tried to do a quick test but this didn't seem to work;

PUT test
{
  "number": "10AB00000001"
}

GET test/_search?q=number=10AB???????1

Maybe someone has an idea though.


(Daniel Mitterdorfer) #3

Hi Jinwu,

According to the docs, Lucene patterns are always anchored so there is no need for anchoring with "^" and "$". For details please see the docs on regexp query, the docs on the regex query syntax and my example below:

Create an index:

PUT /rtest
{
   "mappings": {
      "num": {
         "properties": {
            "number": {
               "type": "string",
               "index": "not_analyzed"
            }
         }
      }
   }
}

Index two documents:

PUT /rtest/num/1
{
  "number": "10AB00000001"
}
PUT /rtest/num/2
{
  "number": "10AB00000001:count"
}

Search

GET /rtest/num/_search
{
   "query": {
      "regexp": {
         "number": {
            "value": "10AB[0-9]{8}"
         }
      }
   }
}

This returns one hit: "10AB00000001", just as we want:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "rtest",
            "_type": "num",
            "_id": "1",
            "_score": 1,
            "_source": {
               "number": "10AB00000001"
            }
         }
      ]
   }
}

For a more formal description of the supported syntax, see Lucene's Javadoc of the class Regexp.

Daniel


(Mark Walkom) #4

Ahh nice!


Anchored Regex on array of tags
(system) #5