Analyzer settings for breaking up words on hyphens

Mike_Topper1 · October 27, 2014, 11:55am

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · October 27, 2014, 5:22pm

You can either use a pattern tokenizer with your patterns being whitespace

hypen, or further decompose your token post tokenization with the word
delimiter token filter, which is much harder to use (and might be an
overkill for your use case).

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Mike_Topper1 · October 27, 2014, 11:07pm

Thanks! i'll go ahead and try the pattern tokenizer route.

On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:

You can either use a pattern tokenizer with your patterns being whitespace

hypen, or further decompose your token post tokenization with the word
delimiter token filter, which is much harder to use (and might be an
overkill for your use case).

Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want to
tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · October 28, 2014, 12:57am

Or you could cheat and use a character filter to turn the hyphen into
spaces. Lots of ways to skin a cat.

On Mon, Oct 27, 2014 at 7:07 PM, Mike Topper topper@gmail.com wrote:

Thanks! i'll go ahead and try the pattern tokenizer route.

On Mon, Oct 27, 2014 at 1:22 PM, Ivan Brusic ivan@brusic.com wrote:

You can either use a pattern tokenizer with your patterns being
whitespace + hypen, or further decompose your token post tokenization with
the word delimiter token filter, which is much harder to use (and might be
an overkill for your use case).

Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,

Ivan

On Mon, Oct 27, 2014 at 7:55 AM, Mike Topper topper@gmail.com wrote:

Hello,

I have a field that is using the whitespace tokenizer, but I also want
to tokenize on hyphens (-) like the standard analyzer does. I'm having
trouble figuring out what additional custom settings I would have to put in
there in order to be able to tokenize off of hyphens as well.

Thanks,
Mike

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedLtdAWEiQN%2BoUV17J5e8DowMbDva2pJn1S%3Dr9w1qtP9bA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDeFdP4-imY0ReSZTkSAnfQ8o6_hWp9MAB0YcMOgDo9rA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALdNedK9EfeL-FGbavnKO4t%3DkrQ%2BxeQ-O2p2wL-P_iqGSrhrsg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1oEgb55Y0tVU6VNzDXEF6RJQRRFZ%3DW2_iKrRmJBMVW2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Changing tokenizer from whitespace to standard Elasticsearch	4	2559	July 6, 2017
Aalyzer issue - terms not getting tokenized on whitespace Elasticsearch	1	302	July 6, 2017
Configuring the standard tokenizer Elasticsearch	8	15242	July 5, 2017
Whitespace analyzer (char-filter And token-filter) Elasticsearch	7	1217	November 27, 2019
Removing whitespace around a delimiter in a custom anaylzer Elasticsearch	12	3103	July 6, 2017

Analyzer settings for breaking up words on hyphens

Related topics