Problem to match patterns in phrases with Regexp Query


(Gustavo Valiati) #1

Hi guys!

I am looking for a recommendation to build a search.

In my database, I have dynamic patterns that should be matched and others that must not.

For example, I have these two documents:

{
  "_id" : 1,
  "description" : 
  """
  potato01

  Blacklist: potato5, potato02, Potato 4
  """
},
{
  "_id" : 2,
  "description" : 
  """
  potato 02

  Black-list here: potato 1, potato 9 and Potato3
  """
}

As you can see, the documents have a field called "description" which is freely written by someone.

In my query I want to find every document that has some variation of "potato 2" (e.g. "Potato2", "Potato 2", "potato 02, etc), but at the same time, the "potato 2" must not be present in the blacklist line.
Therefore, only the second document (id:2) must be returned.

Once the patterns are kind of dynamic, I have wanted to use the regexp query.
However, seems like I have miss-understood the regex query functionality.

I have built the following search.
Seems like the regex is not able to match phrases, but only single terms.
If I just write "potato" in the regex, it does work. But If I write "potato@02", it does not.

GET my-dataset/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "regexp": {
            "description": {
              "value": "@potato.?0?2@",
              "flags": "ALL"
            }            
          }
        }
      ],
      "must_not": [
        {
          "regexp": {
            "description": {
              "value": "@black-?list@potato.?0?2@",
              "flags": "ALL"
            }            
          }
        }
      ]
    }
  }
}

Am I wrong or the reason it does not work properly is that the original phrase has been tokenized already? If so, how could I run it over the original string?

Do you have any thoughts that could help me to find out the mistakes or a way to solve it?

Thanks.


(system) closed #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.