Stemming acronyms ending in "s"; keyword marker token filter; minimal english stemmer


(Loren Siebert) #1

Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.htmlto specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.

Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.

Are either of those options possible?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?

On Wed, Jan 22, 2014 at 5:10 PM, Loren loren@siebert.org wrote:

Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.htmlto specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.

Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.

Are either of those options possible?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7avbq%2B9f2O1HxMzkwDgEUQXj6%2BThVJs3dCSuQObPZFgA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Loren Siebert) #3

Done!

On Thursday, January 23, 2014 1:51:46 PM UTC-8, Adrien Grand wrote:

I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?

On Wed, Jan 22, 2014 at 5:10 PM, Loren <lo...@siebert.org <javascript:>>wrote:

Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.htmlto specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.

Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.

Are either of those options possible?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf0b1c2c-55c6-4bc7-a662-752457de7e61%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4