Query String Regex/WildCard Search

Hi Team,

Problem Statement
Search String - "Hello * World"
Expected Output - 1. "Hello First World"
2. "Hello Second World"

I need results that contain one word in place of *. I have tried using query string regex and wildcard but it is not giving expected results.
Can anyone please help?

Hi,

I wonder if you realize "*" also matches to spaces.
It depends on what characters you are using in "words", regex query with "Hello [!-~] World" will match to phrases which contain only one word with ASCII characters.

Next time, sharing the results and the expected results will give you a more suitable solution.

Hi Tomo_M,

I have tried below query

                    "query_string": {
                      "query": "/Hello (.*?) World/",
                      "fields": ['abc','xyz']
                    }
}```

The regex is correct to match words between **Hello** and **World** but it is not working with query_string and returning 0 results.

What type of field are you querying on: text, keyword or wildcard?

@Tomo_M I am querying on text fields.

text fields split your JSON strings into multiple words in the index (e.g. hello, brave, new and world).
Searches always match what is in the index.
The above query_string regex search is asking for a single word in the index that contains spaces which is not going to exist.
Perhaps more simply, you need to use a phrase query to find documents with the multiple words hello and world but near to each other. This can be done using this syntax in query_string:

    "query_string": {
      "query": "\"Hello world\"~2"
    }

The quotes mean "find these words next to each other" and the ~2 modifier means "with up to two word positions different".

Thanks @Mark_Harwood,

I want to fetch the string that contain atleast 1 word between Hello and World.

I wonder if you realize "*" also matches to spaces.

This in my first reply was for "regex query" but not for "query_string query", sorry.

As Mark said, single regex (sandwiched between '/') is matched with single tokens. And it is not able to be used in phrases.

Wildcard searches are also not suppoted within phrases.
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

I'm not sure if it has acceptable performance, one possible way is to use wildcard field and use regex query such as " Hello [!-~] World " or " [hH]ello [!-~] [wW]orld ". One drawback (or possibly advantage) is that words are not analized and no filters such as stemming doesn't work.

Did you try?

    "query": "\"Hello world\"~1"

@Mark_Harwood, Yes I have tried the above query and it is also returning me string that contains exact string like this "Hello World".

I need results that contain atleast 1 word between Hello and World

Check out interval queries more complex but more powerful. JSON required....

Hi @Tomo_M ,

Wildcard Field is introduced in elastic 7.9.
My current elastic version is 5.6.

Is there a way to achieve this scenerio in elastic 5.6 ?

@Mark_Harwood Internal Queries is not available in version 5.6.

Is there a way to achieve this scenerio in elastic 5.6 ?

See the forerunner, span queries

Hi @Mark_Harwood,

I have applied a workaround for this scenario.
Below is the query

  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "query_string": {
                  "query": "\"Hello World\"",
                  "fields": [],
                  "phrase_slop": 1
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "query_string": {
                  "query": "\"Hello World\"",
                  "fields": []
                }
              }
            ]
          }
        }
      ]
    }
  }
}

This is working fine, but there is one issue that I am getting.

Suppose there is string = "Hello test World Hello" in an index.
Then the above query is highlighting Hello test World and also last Hello.
I do not want to highlight the last Hello because it does not contain World after phrase_slop: 1.

Is there any way to handle this scenario?

We have several highlighter implementations contributed by different parties over time and they differ in their focus eg speed, accuracy etc.
I lose track of which are better at respecting phrase match accuracy but it’s worth experimenting with the different types.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.