What’s the cardinality of values in this field?
As the flowchart at the end of this blog advises - the wildcard field is designed for use on fields with millions of unique values.
Keyword fields get bogged down by large numbers of unique terms while wildcard fields will be bogged down by large numbers of docs that share a common term (ie low cardinality vs high cardinality fields.)
Mostly unique values but there is repetition of words within them - typical log data.
I've ended up using a default text field instead of wildcard, with ngram tokenizer
Just reviewing this topic again and a couple of things jumped out:
Keyword values greater than 256 are completely ignored, not truncated. They are completely missing from the index.
Your example request was searching the text field and not the keyword field which would need to be queried by the ‘result.keyword’ name.
Querying a text field may be faster (fewer unique terms in the index) but your query scope is different - you’re searching within the confines of single terms/words rather than for any possible character sequence in the original value. The blog highlights why word-based text indexes are of less use in machine generated content where there’s no common agreement between searcher and search engine as to what constitutes a word.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.