Improve scoring of search results for a multi-field, weighted Elasticsearch query

Chintan_Tank · November 18, 2019, 7:39pm

I am using Elasticsearch in an app & I am trying to understand the scoring for search relevancy because I am getting some interesting results.

Some of the fields,

# (this is a search only field and is composed of first_name, middle_name, last_name. I have set full_name as the target for `copy_to`)
* full_name 
    type: text
    norms: false

    fields:
      autocomplete:
        type: text
        analyzer: autocomplete_l18n
        search_analyzer: autocomplete_search_l18n
      search:
        type: search_as_you_type

# Same as full_name it is made up of other fields & purely search field
* address 
  type: text          
  fields:
    autocomplete:
      type: text
      analyzer: autocomplete_l18n
      search_analyzer: autocomplete_search_l18n

The custom analyzers I created are,

# For any fields that requires autocomplete feature
autocomplete_l18n:
  type: custom
  tokenizer: standard
  filter:
  - en_stop_filter
  - lowercase
  - autocomplete_filter


# For any fields that would be searched using autocomplete feature
autocomplete_search_l18n:
  type: custom
  tokenizer: standard
  filter:
  - en_stop_filter
  - lowercase

There are other fields I use in search but they don't match for my query, so I am ommitting them here.

I have bunch of indexed entries of the form,

* Foo Baz | 1 Amityville
* Foobar Baz | 2 Townsville
* Foo Baz | 3 Lolsville
* Foodar Baz | 4 Townsville
* Foo Alice | 5 Lolsville
* Alex Baz | 6 Amityville
* Foo Bob | 7 Townsville

(Format here is: `full_name` | `address`)

The query I have used is,

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              [
                {
                  "bool": {
                    "should": [
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "best_fields",
                          "operator": "or",
                          "fuzziness": "AUTO"
                        }
                      },
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "cross_fields",
                          "operator": "or"
                        }
                      },
                      {
                        "multi_match": {
                          "query": "Foo Baz Lolsville",
                          "fields": [
                            "full_name.autocomplete^10",
                            "address.search^8"
                          ],
                          "type": "phrase_prefix",
                          "operator": "or"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ]
            ]
          }
        }
      ]
    }
  }
}

Now when I search with,
"Foo Baz 3 Lolsville" or "Foo Baz Lolsville"

I would expect to get "Foo Baz | 3 Lolsville" as the very first result. But that seems to be the 3rd or 4th result or even lower.

I turned on explain mode in the query & it seems that the search on address is performed BUT it is part of a "max of sub-scores". And hence instead of increasing the scroes it is basically a noop.

What can I do here to ensure that scoring from different fields are added, and not part of "max of"?

system · December 16, 2019, 7:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scoring for a full text search with ngram filter Elasticsearch	4	2305	January 6, 2017
Scoring autcomplete (edgeNGram) results Elasticsearch es-hadoop	5	1475	July 6, 2017
Scoring on a multi_field Elasticsearch	4	770	July 6, 2017
Advanced scoring of muti field searching (only count a token once) Elasticsearch	3	463	July 6, 2017
Overkill with multi-type fields? Elasticsearch	1	331	July 6, 2017

Improve scoring of search results for a multi-field, weighted Elasticsearch query

Related topics