How to boost nested query under bool should query?

joonhocho · July 23, 2017, 9:44am

I want to boost clauses individually under bool.should query.

Here's my data structure:

person = {
  username,
  names: [{first, last}], // nested
  addresses: [{city, state}], // nested
}

I want to boost username by 3, names by 2, and addresses by 1.

How can I achieve this?

Omni query text may be something like "john doe san francisco", and I want to boost results with matched names over results with matched addresses.

Here's my query without boost:

query: {
  bool: {
    should: [
      {match: {username: text}},
      {
        nested: {
          path: 'names',
          query: {
            multi_match: {
              query: text,
              type: 'cross_fields',
              fields: [
                'names.first',
                'names.last',
              ],
            },
          },
        },
      },
      {
        nested: {
          path: 'addresses',
          query: {
            multi_match: {
              query: text,
              type: 'cross_fields',
              fields: [
                'addresses.city',
                'addresses.state',
              ],
            },
          },
        },
      },
    ],
  },
},

Mark_Harwood · July 24, 2017, 9:47am

Careful with blanket statements like this otherwise a man called Francisco may suddenly appear in a lot of results.

Ideally you need to consider on a per-query basis what is the most likely context for each word provided.
This is what cross_fields will try do for you because otherwise (as you may have already discovered) default Lucene ranking favours rare terms and therefore the most unlikely context (name.first:Francisco Vs addresses.city:Francisco).
However, given you have separate clauses for address and person names the cross_fields logic will not figure out which is the most likely choice between the two contexts.
If you were to use copy_to to root-level person and address index fields you could use a single cross_fields multi-match across these fields and it would determine the most appropriate scoring context for each search term. Or alternatively you could just blend all of the terms into a single "all" type field using copy_to and use a regular single-field match to avoid the oddities of right-word-wrong-context matching that Lucene suffers from.

joonhocho · July 25, 2017, 9:13am

thanks for advice. It makes a lot of sense.

I still wonder if there Is a way to boost any should clause without using cross_field on root level or copy_to.
Because by copying to root level, you lose relationships between properties of a nested objects.

Mark_Harwood · July 25, 2017, 9:22am

It's not always a problem. In policing systems a criminal may use a number of aliases and matching any jumbled combination of previous aliases is a feature not a bug but obviously there are other scenarios where this is not desirable.

joonhocho · July 25, 2017, 9:42am

Yes it can be a feature, and I agree. However, sometimes your customers or clients may have their own requirements. I guess there is currently no way to boost one should clause with simple boost parameter.

I still think there is a dirty way of mimicking it by copying the same should clause over multiple times to have the same boost effect even though I am afraid it will slow down the search since it's doing the same work multiple times.

Mark_Harwood · July 25, 2017, 9:56am

The sad truth is users (including myself) are lazy and want to just type words into a single search box and get magic.
However these are bad queries because there is typically much that is left unstated -

Does word1 relate to field X or field Y? Francisco the name or the town?
Do word1 and word2 even relate to the same nested person or are they about different people?

More structured form-based UIs or "did you mean?" type clarifications will help answer the above questions but without this added help from the user it is hard to make a clever solution. Perhaps indexing with shingles will help contextualize input better e.g. the meaning of diego changes if you consider a search for the shingle san diego Vs diego maradona. Surrounding words influence the context.

system · August 22, 2017, 9:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Chaining and boosting multi_match queries Elasticsearch	2	820	July 5, 2017
Boosting a Nested Query Isn't Working Elasticsearch	2	940	December 12, 2018
Constant boost for nested query Elasticsearch	8	1752	July 6, 2017
Bool query with boost for should clause Elasticsearch	2	1585	February 17, 2023
Help understanding boolean query and boosting? Elasticsearch	3	226	August 3, 2023

How to boost nested query under bool should query?

Related topics