How can I put together a case-insensitive analyzer for tokens?


(pulkitsinghal) #1

My input tokens are like:
abcd-123
ABCD-123
abCD-123

Right now I don't analyze them at all but that comes back to bite me if
someone searches for them with the wrong case-sensitivity.

So I want to use or put together an analyzer that doesn't break these
tokens apart but still allows them to be analyzed for case-insensitive
search later. Any suggestions?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #2

Configure a custom analyzer with the keyword tokenizer and lowercase token
filter.

index :
analysis :
analyzer :
lowerKeyword:
type : custom
tokenizer : keyword
filter : [lowercase]

Thanks,
Matt Weber

On Tue, Nov 19, 2013 at 9:16 AM, pulkitsinghal pulkitsinghal@gmail.comwrote:

My input tokens are like:
abcd-123
ABCD-123
abCD-123

Right now I don't analyze them at all but that comes back to bite me if
someone searches for them with the wrong case-sensitivity.

So I want to use or put together an analyzer that doesn't break these
tokens apart but still allows them to be analyzed for case-insensitive
search later. Any suggestions?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(pulkitsinghal) #3

index :
analysis :
analyzer :
lowerKeyword:
type : custom
tokenizer : keyword
filter : [lowercase]

Thanks a lot Matt!

Can anyone additionally tell me how to set this up programmatically via
java?
I don't know how to set a name (like lowerKeyword) for the analyzer ...
here's what I have so far:
indexerSettings.put("analysis.analyzer.events.type", "custom");
indexerSettings.put("analysis.analyzer.events.tokenizer", "keyword");
indexerSettings.put("analysis.analyzer.events.filter", "lowercase");

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(pulkitsinghal) #4

Cihat Keser, over from Jest https://github.com/searchbox-io/Jest, pointed
out that the string "events" in the code block below is what constitutes as
the name for an analyzer:

 indexerSettings.put("analysis.analyzer.events.type", "custom");
 indexerSettings.put("analysis.analyzer.events.tokenizer", "keyword");
 indexerSettings.put("analysis.analyzer.events.filter", "lowercase");

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5