loren
(Loren Siebert)
January 22, 2014, 4:10pm
1
Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html ,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.
Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.
Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out .
jpountz
(Adrien Grand)
January 23, 2014, 9:51pm
2
I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?
On Wed, Jan 22, 2014 at 5:10 PM, Loren loren@siebert.org wrote:
Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html ,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.
Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.
Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out .
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7avbq%2B9f2O1HxMzkwDgEUQXj6%2BThVJs3dCSuQObPZFgA%40mail.gmail.com .
For more options, visit https://groups.google.com/groups/opt_out .
loren
(Loren Siebert)
January 23, 2014, 10:01pm
3
Done!
opened 10:00PM - 23 Jan 14 UTC
closed 03:13PM - 28 Mar 17 UTC
>feature
help wanted
:Search/Analysis
I would like to be able to tell the keyword marker to protect tokens 1-4 charact… ers in length, or tell the minimal english stemmer to ignore tokens shorter than 5 characters.
Perhaps the more generic thing to have would be a Minimum Length Keyword Marker that could go in front of the other filters.
Based on discussion at https://groups.google.com/forum/#!msg/elasticsearch/uFlKWq2HvQk/mM8KjaItPH0J
On Thursday, January 23, 2014 1:51:46 PM UTC-8, Adrien Grand wrote:
I think it would be nice to support protecting tokens based on their
length. Maybe you can open an issue about it?
On Wed, Jan 22, 2014 at 5:10 PM, Loren <lo...@siebert.org <javascript:>>wrote:
Using the minimal_english stemmerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html ,
acronym tokens like "irs" and "nps" get stemmed to "ir" and "np". I can use
the keyword marker token filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-marker-tokenfilter.html to specify a list of acronyms to protect, but I do not know them all in
advance so I will be constantly tweaking the list and reindexing.
Ideally, I would like to be able to either tell the keyword marker to
protect tokens 1-4 characters in length, or tell the minimal english
stemmer to ignore tokens shorter than 5 characters.
Are either of those options possible?
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e385b457-6eed-4a98-975d-9cf19375c39f%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out .
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf0b1c2c-55c6-4bc7-a662-752457de7e61%40googlegroups.com .
For more options, visit https://groups.google.com/groups/opt_out .