We have an index mapping schema with a lot of text fields. To be able to sort and filter them we added keyword subfield mapping with lowercase normalizer. Here is a short part of our schema:
But it returns incorrect results (expected are 2 and 3):
If we have a text field, and a subfield as a keyword - it returns 3 and 4 results
If we remap to have only a text field - it returns a 3 result
If we remap to have only a text field, and also add a simple build-in analyzer to it - it returns the expected results
If we have a text field, and a subfield as a keyword, and also add a simple build-in analyzer to the text field - it returns 2, 3 and 4
What we're missing here? What options do we have?
Please note, that we need to support sorting, filtering (which is available with a keyword subfield), and a full-text wildcard query with an asterisk in the middle.
And normally, users don't enter wildcards on a search engine. I'm never doing this within the google search bar as an example.
Instead, you should look at the wildcard field type if you really want to use wildcards.
But it returns incorrect results (expected are 2 and 3):
ele*on matches ele cti on IMO... But I understand what you mean. You want to compare full terms, right? So you want to compare ele*on with selection, electron.jpg, election, ele, cti and on, right.?
So you need to find an analyzer which does exactly this. I'd use a custom analyzer and use the _analyze API to better understand ho to build the right one for your use case. See Test an analyzer | Elasticsearch Guide [8.8] | Elastic.
I'd recommend looking at ngrams instead of using wildcards.
We'd read about that, but for now, we decided to start in this way since we migrating from Azure Search and we use a similar approach there (Azure Search is also built on top of the Lucene engine). For other scenarios (including trailing and leading wildcard querying), everything works fine.
Regarding using the wildcard field type as far as I understand we can't do a full-text search with this field, we have to add a specific field in wildcard query?
You got it right! As I mentioned, we tried a simple analyzer for a text field only. This works as expected (search response contains electron.jpg and election):
But as soon as we add a keyword subfield the search response will start returning electron.jpg, election, and also ele cti on. We found it weird since we thought that keyword subfield mapping should be different from the main text field.
Thanks for suggesting ngrams! Will we be able to support our scenarios with them (both full-text and search within a specific field)?
I think (from what I recall), that Azure Search was actually built on top of Elasticsearch. But that's another story .
we can't do a full-text search with this field, we have to add a specific field in wildcard query?
Indeed. So normally I recommend doing multiple searches at the same time. Combining scores between partial match and exact match is normally super helpful for the end users. See the following script as an idea:
Will we be able to support our scenarios with them (both full-text and search within a specific field)?
As far as I understand, your recommendation is to do multiple wildcard searches for each field in our index document if we want to achieve a full-text search with a wildcard query. Like:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.