Running wildcard queries on keyword fields has two problems:
It wont work on large values
the search cost is linear with the number of unique values
That’s why the wildcard field was created and this blog gives the background. This too has shortcomings because the search cost is linear with the number of docs that hold a value that roughly matches the search.
There’s always some kind of performance trade off.
Totally agree, performance issues should be considered on this use case. The blog post above has a guide to choose data type. If you not sure what data type to use. A possible approach is create a new sample index to explore your data using the desired type, and use the reindex API to reindex part or the whole production index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.