Help with Ngrams - also index non-Ngrams

Hi

We are trying to use ngrams in our index. Currently we have this
configuration:

                "ngram":{
                    "tokenizer":"whitespace",
                    "filter":[
                        "standard",
                        "lowercase",
                        "ngram",
                        "catenate_words"
                    ]
                },

....
"ngram":{
"type":"nGram",
"min_gram":3,
"max_gram":50
},

When looking at a term like "mc donalds" we see that the "mc" part is not
put into the index. Anybody have an idea of how to configure the index so
that we get "mc" as well as "don", "ald" etc (without changing the min_gram
setting)?

/Per

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I believe it's min_gram:3 setting which prevents any terms shorter than 3
characters to be an ngram

On Thursday, February 28, 2013 10:22:15 AM UTC-5, Per Ekman wrote:

Hi

We are trying to use ngrams in our index. Currently we have this
configuration:

                "ngram":{
                    "tokenizer":"whitespace",
                    "filter":[
                        "standard",
                        "lowercase",
                        "ngram",
                        "catenate_words"
                    ]
                },

....
"ngram":{
"type":"nGram",
"min_gram":3,
"max_gram":50
},

When looking at a term like "mc donalds" we see that the "mc" part is not
put into the index. Anybody have an idea of how to configure the index so
that we get "mc" as well as "don", "ald" etc (without changing the min_gram
setting)?

/Per

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I am afraid you can't do this at this point. Terms shorter that the
min-gram size will always be dropped (swallowed).

simon

On Thursday, February 28, 2013 4:22:15 PM UTC+1, Per Ekman wrote:

Hi

We are trying to use ngrams in our index. Currently we have this
configuration:

                "ngram":{
                    "tokenizer":"whitespace",
                    "filter":[
                        "standard",
                        "lowercase",
                        "ngram",
                        "catenate_words"
                    ]
                },

....
"ngram":{
"type":"nGram",
"min_gram":3,
"max_gram":50
},

When looking at a term like "mc donalds" we see that the "mc" part is not
put into the index. Anybody have an idea of how to configure the index so
that we get "mc" as well as "don", "ald" etc (without changing the min_gram
setting)?

/Per

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

alright, I guess we'll have to use the combo-plugin in some way to do it

On Fri, Mar 1, 2013 at 12:05 PM, simonw
simon.willnauer@elasticsearch.comwrote:

I am afraid you can't do this at this point. Terms shorter that the
min-gram size will always be dropped (swallowed).

simon

On Thursday, February 28, 2013 4:22:15 PM UTC+1, Per Ekman wrote:

Hi

We are trying to use ngrams in our index. Currently we have this
configuration:

                "ngram":{
                    "tokenizer":"whitespace",
                    "filter":[
                        "standard",
                        "lowercase",
                        "ngram",
                        "catenate_words"
                    ]
                },

....
"ngram":{
"type":"nGram",
"min_gram":3,
"max_gram":50
},

When looking at a term like "mc donalds" we see that the "mc" part is not
put into the index. Anybody have an idea of how to configure the index so
that we get "mc" as well as "don", "ald" etc (without changing the min_gram
setting)?

/Per

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/XW-zteIb6y0/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.