Problem Statement
Search String - "Hello * World"
Expected Output - 1. "Hello First World"
2. "Hello Second World"
I need results that contain one word in place of *. I have tried using query string regex and wildcard but it is not giving expected results.
Can anyone please help?
I wonder if you realize "*" also matches to spaces.
It depends on what characters you are using in "words", regex query with "Hello [!-~] World" will match to phrases which contain only one word with ASCII characters.
Next time, sharing the results and the expected results will give you a more suitable solution.
"query_string": {
"query": "/Hello (.*?) World/",
"fields": ['abc','xyz']
}
}```
The regex is correct to match words between **Hello** and **World** but it is not working with query_string and returning 0 results.
text fields split your JSON strings into multiple words in the index (e.g. hello, brave, new and world).
Searches always match what is in the index.
The above query_string regex search is asking for a single word in the index that contains spaces which is not going to exist.
Perhaps more simply, you need to use a phrase query to find documents with the multiple words hello and world but near to each other. This can be done using this syntax in query_string:
"query_string": {
"query": "\"Hello world\"~2"
}
The quotes mean "find these words next to each other" and the ~2 modifier means "with up to two word positions different".
I'm not sure if it has acceptable performance, one possible way is to use wildcard field and use regex query such as " Hello [!-~] World " or " [hH]ello [!-~] [wW]orld ". One drawback (or possibly advantage) is that words are not analized and no filters such as stemming doesn't work.
This is working fine, but there is one issue that I am getting.
Suppose there is string = "Hello test World Hello" in an index.
Then the above query is highlighting Hello test World and also last Hello.
I do not want to highlight the last Hello because it does not contain World after phrase_slop: 1.
We have several highlighter implementations contributed by different parties over time and they differ in their focus eg speed, accuracy etc.
I lose track of which are better at respecting phrase match accuracy but it’s worth experimenting with the different types.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.