Regexp character '!' doesnt work in ES dsl query even on type set as 'keyword' in mappings

The mapping of my Elastic search looks like below:

{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "node": {
      "properties": {
        "field1": {
          "type": "keyword"
        },
        "field2": {
          "type": "keyword"
        },
        "query": {
          "properties": {
            "regexp": {
              "properties": {
                "field1": {
                  "type": "keyword"
                },
                "field2": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

Problem is :
I am forming ES queries using elasticsearch_dsl Q(). It works perfectly fine in most of the cases when my query contains any complex regexp. But it totally fails if it contains regexp character '!' in it. It doesn't give any result when the search term contains '!' in it.

For eg:

1.) Q('regexp', field1 = "^[a-z]{3}.b.*") (works perfectly)
2.) Q('regexp', field1 = "^f04.*") (works perfectly)
3.) Q('regexp', field1 = "f00.*") (works perfectly)
4.) Q('regexp', field1 = "f04baz?") (works perfectly)

Fails in below case:
5.) Q('regexp', field1 = "f04((?!z).)*") (Fails with no results at all)

I tried adding "analyzer":"keyword" along with "type":"keyword" as above in the fields, but in that case nothing works.

In the browser i tried to check how analyzer:keyword will work on the input on the case it fails:

http://localhost:9210/search/_analyze?analyzer=keyword&text=f04((?!z).)*

Seems to look fine here with result:

{
  "tokens": [
    {
      "token": "f04((?!z).)*",
      "start_offset": 0,
      "end_offset": 12,
      "type": "word",
      "position": 0
    }
  ]
}

I'm running my queries like below:

search_obj = Search(using = _conn, index = _index, doc_type = _type).query(Q('regexp', field1 = "f04baz?"))
count = search_obj.count()
response = search_obj[0:count].execute()
logger.debug("total nodes(hits):" + " " + str(response.hits.total))

PLease help, its really a annoying problem as all the regex characters work fine in all the queries except !.

Also, how do i check what analyzer is currently applied with above setting in my mappings?

Did you open as well:

? Is it the same question?

If so please don't open multiple questions for the same topic.

Sure, Can you plz help me up with my this question?

No. I'm not good at regex sadly.

ok, can you tell, In my mapping my types are set as 'keyword'? is that the only thing required to read the search term as it is? if yes, then why its dropping '!' character? also i can i make use of analyze() function to see what's happening exactly? Also, I tried doing this:

http://localhost:9210/search/_analyze?analyzer=keyword&text=f04((?!z).)*

Seems to look fine here with result:

{
  "tokens": [
    {
      "token": "f04((?!z).)*",
      "start_offset": 0,
      "end_offset": 12,
      "type": "word",
      "position": 0
    }
  ]
}

But i don't know how to check in the code what's happening exactly.

Use the analyze API to see how your text is indexed when using a keyword tokenizer which is the same that happens with a keyword type.

I believe ? Is correctly indexed.
But I guess that in regex you need to write something like ?? Or ?

Again I'm not good at this :slight_smile:

I am unable to follow the docs to make use of analyze api? I am executing my query like below:

search_obj = Search(using = _conn, index = _index, doc_type = _type).query(Q('regexp', field1 = "f04baz?"))
count = search_obj.count()
response = search_obj[0:count].execute()
logger.debug("total nodes(hits):" + " " + str(response.hits.total))

Can you tell how do i put analyze code here to check how it's getting tokenized plz?

See https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html

I need to do something like this: https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Index.analyze , in order to check how its getting analyzed now. But i believe its getting analyzed correctly only because of type:keyword in mapping and moreover all other regexp queries work, only the one containing " doesnt.

Can you help me in putting this analyze method here in my code?

Start Kibana and run the sample code in Dev Console. You don't need to add the analyze API in your project.

i just tried to do something like this^(f04ba)[^z]+?$ instead of thisf04((?!z).)* to avoid answers with z; and it did work. Does this thing gives you any hint that why ! in the regexp query doesnt give any results?

The main thing is that you don't have lookarounds in Lucene regular expression engine. Lucene regular expressions is not Perl-compatible but supports a smaller range of operators Regular Expression Syntex

But you have an option, in ElasticSearch, you can use specific optional operators .

The ~ (tilde) is the complement that is *used to negate an atom right after it. An atom is either a single symbol or a group of subpatterns/alternatives inside a group.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.