My input tokens are like:
abcd-123
ABCD-123
abCD-123
Right now I don't analyze them at all but that comes back to bite me if
someone searches for them with the wrong case-sensitivity.
So I want to use or put together an analyzer that doesn't break these
tokens apart but still allows them to be analyzed for case-insensitive
search later. Any suggestions?
My input tokens are like:
abcd-123
ABCD-123
abCD-123
Right now I don't analyze them at all but that comes back to bite me if
someone searches for them with the wrong case-sensitivity.
So I want to use or put together an analyzer that doesn't break these
tokens apart but still allows them to be analyzed for case-insensitive
search later. Any suggestions?
index :
analysis :
analyzer :
lowerKeyword:
type : custom
tokenizer : keyword
filter : [lowercase]
Thanks a lot Matt!
Can anyone additionally tell me how to set this up programmatically via
java?
I don't know how to set a name (like lowerKeyword) for the analyzer ...
here's what I have so far:
indexerSettings.put("analysis.analyzer.events.type", "custom");
indexerSettings.put("analysis.analyzer.events.tokenizer", "keyword");
indexerSettings.put("analysis.analyzer.events.filter", "lowercase");
Cihat Keser, over from Jest https://github.com/searchbox-io/Jest, pointed
out that the string "events" in the code block below is what constitutes as
the name for an analyzer:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.