The problem that I'm facing right now is that I'm not able to create a testing index with a wildcard field. Following the documentation, I understand that the wildcard belongs to X-Pack but this particular feature of X-Pack is available for Basic license and that it's already preinstalled in my current working version: 7.9
Am I missing something?
Furthermore, I think that I might be facing other problems with this field. Since I'm working with spanish values that are extracted from PDFs, I found that the best way to normalize my content was using the icu_analyzer which helped me with the special characters folding. If I make this wildcard field work, will it be able to be combined with the icu filters?
Let me know if there's any relevant information that I've missed in this post and I will provide it ASAP.
Thanks!
I hope this is useful (I've erased license uid and name). I am already exploring other solutions without wildcard, but it would be great to solve this aswell...
Thank you for you time.
Currently wildcard field does not support normalizers. It is generally targeted towards machine generated content, which is not served well by "opinionated" language processors typically used for text. While the meaning of words in text don't change with some light processing, the sorts of things machines understand (urls, file paths, package names) do change meaning if you treat them in any way. Some file systems are case insensitive so we will offer search options for case insensitive matching but any further normalisation isn't something we've felt the need for yet.
What sort of content do you have and why is a text field not a good fit?
I was trying to perform queries that would be able to "match_phrase", in this way they would treat the query with the order constrain between tokens but allowing the last element to be treated as a prefix (match_phrase_prefix) prefix and the first word as a suffix (which I think there's no built_in feature).
Example:
Performing the search for "joaquin ormachea" would be matching the documents containing:
He is Joaquin Ormachea
He isJoaquin Ormachea
He is Joaquin Ormachea.
What I've been trying today was building a combined analyzer. This is what I've ended up with:
Wildcard fields, like keyword fields do not support phrase type queries because these queries are used to find sequences of words and these fields logically contain only one big word.
Instead wildcard is optimised for finding character sequences anywhere inside a string using regex or wildcard queries.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.