I am using Elasticsearch in an app & I am trying to understand the scoring for search relevancy because I am getting some interesting results.
Some of the fields,
# (this is a search only field and is composed of first_name, middle_name, last_name. I have set full_name as the target for `copy_to`)
* full_name
type: text
norms: false
fields:
autocomplete:
type: text
analyzer: autocomplete_l18n
search_analyzer: autocomplete_search_l18n
search:
type: search_as_you_type
# Same as full_name it is made up of other fields & purely search field
* address
type: text
fields:
autocomplete:
type: text
analyzer: autocomplete_l18n
search_analyzer: autocomplete_search_l18n
The custom analyzers I created are,
# For any fields that requires autocomplete feature
autocomplete_l18n:
type: custom
tokenizer: standard
filter:
- en_stop_filter
- lowercase
- autocomplete_filter
# For any fields that would be searched using autocomplete feature
autocomplete_search_l18n:
type: custom
tokenizer: standard
filter:
- en_stop_filter
- lowercase
There are other fields I use in search but they don't match for my query, so I am ommitting them here.
I have bunch of indexed entries of the form,
* Foo Baz | 1 Amityville
* Foobar Baz | 2 Townsville
* Foo Baz | 3 Lolsville
* Foodar Baz | 4 Townsville
* Foo Alice | 5 Lolsville
* Alex Baz | 6 Amityville
* Foo Bob | 7 Townsville
(Format here is: `full_name` | `address`)
The query I have used is,
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
[
{
"bool": {
"should": [
{
"multi_match": {
"query": "Foo Baz Lolsville",
"fields": [
"full_name.autocomplete^10",
"address.search^8"
],
"type": "best_fields",
"operator": "or",
"fuzziness": "AUTO"
}
},
{
"multi_match": {
"query": "Foo Baz Lolsville",
"fields": [
"full_name.autocomplete^10",
"address.search^8"
],
"type": "cross_fields",
"operator": "or"
}
},
{
"multi_match": {
"query": "Foo Baz Lolsville",
"fields": [
"full_name.autocomplete^10",
"address.search^8"
],
"type": "phrase_prefix",
"operator": "or"
}
}
],
"minimum_should_match": 1
}
}
]
]
}
}
]
}
}
}
Now when I search with,
"Foo Baz 3 Lolsville" or "Foo Baz Lolsville"
I would expect to get "Foo Baz | 3 Lolsville" as the very first result. But that seems to be the 3rd or 4th result or even lower.
I turned on explain mode in the query & it seems that the search on address
is performed BUT it is part of a "max of sub-scores". And hence instead of increasing the scroes it is basically a noop.
What can I do here to ensure that scoring from different fields are added, and not part of "max of"?