I'm playing with building some Kibana dashboards and I noticed a peculiar behavior - when I try to filter documents by very simple prefix queries involving one string field (i.e. field:value*) some fields can be queried this way, others return no results.
All queries work if I try to filter by an actual full field value, i.e. field1:value or field2:somestring
For some fields I get the expected results when I run a prefix query, i.e. field1:val*
For other fields I get nothing, i.e. field2:some* returns no results even as field2:something returns results
The frustrating part is that I haven't been able to figure out what the difference is between the those fields that makes them behave differently. All fields are strings, they are not analyzed and the _all field is disabled. Prefix queries with similar intent using the Elasticsearch query language on the source data always work as expected.
I figured out what's causing the wildcard queries to break - it's the uppercase letters in the query pattern.
For a field "field" with a value "string_a", a query like field:strin* will return results.
For a value of "strING_a" (capitalized "ING") the query field:str* will return results, but the query field:strING* will return no results. Neither will field:string*.
Basically, it looks like only lowercase characters can match each other in the query string and in the value. * will match any lowercase or uppercase characters, but uppercase characters don't match each other, and a lowercase and uppercase character don't match each other either. Again, this is only in the Lucene flavor used by Kibana queries; Elasticsearch queries do not have this problem.
I found some other questions about this behavior online, but with no answers.
I don't know what the motivation behind implementing such a behavior was (I assume there's got to be an important technical limitation involved), but to anyone like me who comes from a traditional programming background this is mind-boggling and terribly unintuitive - for the life of me I didn't want to believe that an uppercase character does not match itself.
Are there any workarounds (besides making sure all values you want to use in wildcard queries are lowercase)?
Kibana simply passes the query to Elasticsearch, so the issue here is the Lucene query syntax and how the standard analyzer within Elasticsearch behaves. By default, all tokens are converted to lower-case, so this is why your upper-case letters fail in queries. You can tweak this by setting "lowercase_expanded_terms" to false or using something other than a standard analyzer.
Thanks for following up. Indeed the Lucene query is the issue here. I knew that tokens are lower-cased, however, the surprise was that even if I express the query using lowercase, the match still does not succeed.
I.e. - for a value "strING_a" the query "str*" succeeds, the query "strING*" fails, but the query "string*" also fails. If values were tokenized lowercase shouldn't the last query succeed?
Just to close on a potential solution I was looking at - making the prefix of the string value I was going to search for all lowercase solved the problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.