Hi Torben,
Indeed, this is due to the fact that the ngram FILTER writes terms at the
same position (like synonyms) while the TOKENIZER generates a stream of
tokens which have consecutive positions. This gives blablablafoobarbarbar a
larger number of positions and thus a smaller length normalization.
On Thu, Mar 5, 2015 at 9:56 PM, Torben can.i.hazz@gmail.com wrote:
If anyone is interested:
I resolved this issue by replacing the nGram filter with a nGram tokenizer:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my_index_analyzer": {
"type": "custom",
"tokenizer": "my_ngram_tokenizer",
"filter": [
"lowercase"
]
},
"my_search_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
},
"tokenizer" : {
"my_ngram_tokenizer" : {
"type" : "nGram",
"min_gram" : "1",
"max_gram" : "50"
}
}
}
}
}Now, the term "barfoobar" has a higher score than "blablablafoobarbarbar".
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/521b01f1-c33f-4444-a618-edb458cd6717%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/521b01f1-c33f-4444-a618-edb458cd6717%40googlegroups.com?utm_medium=email&utm_source=footer
.For more options, visit https://groups.google.com/d/optout.
--
Adrien Grand
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5pz5UNZAXaFHsCtjqsLLnuLfn_Xp3tq4xaotiU0mMNag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.