I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.
You can either use a pattern tokenizer with your patterns being whitespace
hypen, or further decompose your token post tokenization with the word
delimiter token filter, which is much harder to use (and might be an
overkill for your use case).
Cheers,
Ivan
On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:
Hello,
I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.
Thanks! i'll go ahead and try the pattern tokenizer route.
On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:
You can either use a pattern tokenizer with your patterns being whitespace
hypen, or further decompose your token post tokenization with the word
delimiter token filter, which is much harder to use (and might be an
overkill for your use case).
On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:
Hello,
I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.
Or you could cheat and use a character filter to turn the hyphen into
spaces. Lots of ways to skin a cat.
On Mon, Oct 27, 2014 at 7:07 PM, Mike Topper topper@gmail.com wrote:
Thanks! i'll go ahead and try the pattern tokenizer route.
On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:
You can either use a pattern tokenizer with your patterns being
whitespace + hypen, or further decompose your token post tokenization with
the word delimiter token filter, which is much harder to use (and might be
an overkill for your use case).
On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:
Hello,
I have a field that is using the whitespace tokenizer, but I also want
to tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.