Query String Regex/WildCard Search

Sahil5 · January 10, 2022, 5:55am

Hi Team,

Problem Statement
Search String - "Hello * World"
Expected Output - 1. "Hello First World"
2. "Hello Second World"

I need results that contain one word in place of *. I have tried using query string regex and wildcard but it is not giving expected results.
Can anyone please help?

Tomo_M · January 10, 2022, 8:52am

Hi,

I wonder if you realize "*" also matches to spaces.
It depends on what characters you are using in "words", regex query with "Hello [!-~] World" will match to phrases which contain only one word with ASCII characters.

Next time, sharing the results and the expected results will give you a more suitable solution.

Sahil5 · January 10, 2022, 9:16am

Hi Tomo_M,

I have tried below query

                    "query_string": {
                      "query": "/Hello (.*?) World/",
                      "fields": ['abc','xyz']
                    }
}```

The regex is correct to match words between **Hello** and **World** but it is not working with query_string and returning 0 results.

Tomo_M · January 10, 2022, 9:24am

What type of field are you querying on: text, keyword or wildcard?

Sahil5 · January 10, 2022, 9:47am

@Tomo_M I am querying on text fields.

Mark_Harwood · January 10, 2022, 10:02am

text fields split your JSON strings into multiple words in the index (e.g. hello, brave, new and world).
Searches always match what is in the index.
The above query_string regex search is asking for a single word in the index that contains spaces which is not going to exist.
Perhaps more simply, you need to use a phrase query to find documents with the multiple words hello and world but near to each other. This can be done using this syntax in query_string:

    "query_string": {
      "query": "\"Hello world\"~2"
    }

The quotes mean "find these words next to each other" and the ~2 modifier means "with up to two word positions different".

Sahil5 · January 10, 2022, 10:20am

Thanks @Mark_Harwood,

I want to fetch the string that contain atleast 1 word between Hello and World.

Tomo_M · January 10, 2022, 11:08am

I wonder if you realize "*" also matches to spaces.

This in my first reply was for "regex query" but not for "query_string query", sorry.

As Mark said, single regex (sandwiched between '/') is matched with single tokens. And it is not able to be used in phrases.

Wildcard searches are also not suppoted within phrases.
https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

I'm not sure if it has acceptable performance, one possible way is to use wildcard field and use regex query such as " Hello [!-~] World " or " [hH]ello [!-~] [wW]orld ". One drawback (or possibly advantage) is that words are not analized and no filters such as stemming doesn't work.

Mark_Harwood · January 10, 2022, 11:11am

Did you try?

    "query": "\"Hello world\"~1"

Sahil5 · January 10, 2022, 11:25am

@Mark_Harwood, Yes I have tried the above query and it is also returning me string that contains exact string like this "Hello World".

I need results that contain atleast 1 word between Hello and World

Mark_Harwood · January 10, 2022, 11:30am

Check out interval queries more complex but more powerful. JSON required....

Sahil5 · January 10, 2022, 11:30am

Hi @Tomo_M ,

Wildcard Field is introduced in elastic 7.9.
My current elastic version is 5.6.

Is there a way to achieve this scenerio in elastic 5.6 ?

Sahil5 · January 10, 2022, 11:32am

@Mark_Harwood Internal Queries is not available in version 5.6.

Is there a way to achieve this scenerio in elastic 5.6 ?

Mark_Harwood · January 10, 2022, 11:57am

See the forerunner, span queries

Sahil5 · January 11, 2022, 2:43pm

Hi @Mark_Harwood,

I have applied a workaround for this scenario.
Below is the query

  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "query_string": {
                  "query": "\"Hello World\"",
                  "fields": [],
                  "phrase_slop": 1
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "query_string": {
                  "query": "\"Hello World\"",
                  "fields": []
                }
              }
            ]
          }
        }
      ]
    }
  }
}

This is working fine, but there is one issue that I am getting.

Suppose there is string = "Hello test World Hello" in an index.
Then the above query is highlighting Hello test World and also last Hello.
I do not want to highlight the last Hello because it does not contain World after phrase_slop: 1.

Is there any way to handle this scenario?

Mark_Harwood · January 11, 2022, 10:36pm

We have several highlighter implementations contributed by different parties over time and they differ in their focus eg speed, accuracy etc.
I lose track of which are better at respecting phrase match accuracy but it’s worth experimenting with the different types.

system · February 8, 2022, 10:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query_string with wildcard and whitespace or dash Elasticsearch	1	1151	February 6, 2019
Wildcard search with space in the text Elasticsearch	11	8404	February 20, 2020
How can wildcard be used in a phrase query? Elasticsearch	3	2483	November 14, 2017
Issue with query_string query Elasticsearch	1	287	April 28, 2021
Search Special char support Elasticsearch	3	139	February 27, 2024

Query String Regex/WildCard Search

Related topics