The standard analyzer splits on * so a regular expression that expects to find a * would never match any documents. Indeed, a whitespace tokenizer might work.
I'd also like to point out that regexp queries are super slow, especially if there are wildcards in the beginning like here, so I would advise to do things differently if possible. For instance, this particular use-case could be solved by indexing 3-grams.
Thanks for your quick answer Adrien.
As I understand if we use default analyzer which is standard analyzer, there is no way for searching with reserved symbols at all . ? + * | { } [ ] ( ) " \.
Solution 1: Use indexing 3-grams - it means add some n-gram(in our case 3 grams) tokenizer and so on.
Solution 2: Use some Pattern tokenizer with our new analyzer?
One more question:
Does this part of documentation not about if we use standard analyzer?
Allowed characters
Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped. The standard reserved characters are:
. ? + * | { } [ ] ( ) " \
If you enable optional features (see below) then these characters may also be reserved:
# @ & < > ~
Any reserved character can be escaped with a backslash "*" including a literal backslash character: "\"
Additionally, any characters (except double quotes) are interpreted literally when surrounded by double quotes:
This is a generic advice. Obviously it is irrelevant with analyzers that split on those chars. For instance if you field is mapped as a keyword, it applies.
When we use standard analyzer and have index analyzed, with symbol [space] there is no way for searching as well?
For instance, firstname [space] surname(ex: Hayk Hovhannisyan), is not returning any results.
if the user enter for example "Ha Hov" then it should find "Hayk Hovhannisyan" as well. And need relations AND.
At this moment I used REGEXP query , and have analyzed field for that.
What can you suggest me for that?
The usual way that this would be done would be to use an edge-ngram filter in the index analyzer (but not in the search analyzer) and then use regular match queries for searching.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.