Analyzer settings for breaking up words on hyphens

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You can either use a pattern tokenizer with your patterns being whitespace

  • hypen, or further decompose your token post tokenization with the word
    delimiter token filter, which is much harder to use (and might be an
    overkill for your use case).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks! i'll go ahead and try the pattern tokenizer route.

On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:

You can either use a pattern tokenizer with your patterns being whitespace

  • hypen, or further decompose your token post tokenization with the word
    delimiter token filter, which is much harder to use (and might be an
    overkill for your use case).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Or you could cheat and use a character filter to turn the hyphen into
spaces. Lots of ways to skin a cat.

On Mon, Oct 27, 2014 at 7:07 PM, Mike Topper topper@gmail.com wrote:

Thanks! i'll go ahead and try the pattern tokenizer route.

On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:

You can either use a pattern tokenizer with your patterns being
whitespace + hypen, or further decompose your token post tokenization with
the word delimiter token filter, which is much harder to use (and might be
an overkill for your use case).

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want
to tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1oEgb55Y0tVU6VNzDXEF6RJQRRFZ%3DW2_iKrRmJBMVW2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.