I'm having problems getting best name matches when making a query
across multiple fields with dis_max, here's a complete test case for
what I'm doing:
https://gist.github.com/2059195
In summary I'm trying to search through data like:
doc 1:
"object_name" : "Albino Elephant"
doc 2:
"object_name" : "Cute Albino Elephant"
"object_name_other_language" : "Cute Albino Elephant"
doc 3:
"object_name" : "The Cutest Albino Elephant"
"object_name_other_language" : "The Cutest Albino Elephant"
I.e. a bunch of objects that have multiple names that have been
translated into different languages (because I want more accurate
matches than using an array, and I want special analyzers per
language).
When I do a search for "Albino Elephant" with this query:
"dis_max" : {
"queries" : [
{
"text" : {
"object_name.unmunged" : "Albino Elephant"
}
},
{
"text" : {
"object_name_other_language.unmunged" :
"Albino Elephant"
}
}
]
}
doc #3 is going to be the highest scored hit, but if I were to change
doc #1 to:
doc 1:
"object_name" : "Albino Elephant"
"object_name_other_language" : "Albino Elephant"
It would be the number #1 hit, so seemingly dis_max is behaving like a
bool/should query and aggregating the scores. I thought the whole
point of dis_max was to do "execute all these queries, compute their
scores, and pick the highest one" so you could search through
e.g. multiple translations and not give objects extra scores by virtue
of having more translations.
Or maybe something else is up here, Clinton Gormley suggested on IRC
that this might be because I had >1 shards, but I changed the gist to
only create one shard and still have the same results.