Improve scoring of search results for a multi-field, weighted Elasticsearch query

I am using Elasticsearch in an app & I am trying to understand the scoring for search relevancy because I am getting some interesting results.

Some of the fields,

# (this is a search only field and is composed of first_name, middle_name, last_name. I have set full_name as the target for `copy_to`)
* full_name 
    type: text
    norms: false

    fields:
      autocomplete:
        type: text
        analyzer: autocomplete_l18n
        search_analyzer: autocomplete_search_l18n
      search:
        type: search_as_you_type

# Same as full_name it is made up of other fields & purely search field
* address 
  type: text          
  fields:
    autocomplete:
      type: text
      analyzer: autocomplete_l18n
      search_analyzer: autocomplete_search_l18n

The custom analyzers I created are,

# For any fields that requires autocomplete feature
autocomplete_l18n:
  type: custom
  tokenizer: standard
  filter:
  - en_stop_filter
  - lowercase
  - autocomplete_filter


# For any fields that would be searched using autocomplete feature
autocomplete_search_l18n:
  type: custom
  tokenizer: standard
  filter:
  - en_stop_filter
  - lowercase

There are other fields I use in search but they don't match for my query, so I am ommitting them here.

I have bunch of indexed entries of the form,

* Foo Baz | 1 Amityville
* Foobar Baz | 2 Townsville
* Foo Baz | 3 Lolsville
* Foodar Baz | 4 Townsville
* Foo Alice | 5 Lolsville
* Alex Baz | 6 Amityville
* Foo Bob | 7 Townsville

(Format here is: `full_name` | `address`)

The query I have used is,

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              [
                {
                  "bool": {
                    "should": [
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "best_fields",
                          "operator": "or",
                          "fuzziness": "AUTO"
                        }
                      },
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "cross_fields",
                          "operator": "or"
                        }
                      },
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "phrase_prefix",
                          "operator": "or"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ]
            ]
          }
        }
      ]
    }
  }
}

Now when I search with,
"Foo Baz 3 Lolsville" or "Foo Baz Lolsville"

I would expect to get "Foo Baz | 3 Lolsville" as the very first result. But that seems to be the 3rd or 4th result or even lower.

I turned on explain mode in the query & it seems that the search on address is performed BUT it is part of a "max of sub-scores". And hence instead of increasing the scroes it is basically a noop.

What can I do here to ensure that scoring from different fields are added, and not part of "max of"?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.