How to boost nested query under bool should query?

I want to boost clauses individually under bool.should query.

Here's my data structure:

person = {
  username,
  names: [{first, last}], // nested
  addresses: [{city, state}], // nested
}

I want to boost username by 3, names by 2, and addresses by 1.

How can I achieve this?

Omni query text may be something like "john doe san francisco", and I want to boost results with matched names over results with matched addresses.

Here's my query without boost:

query: {
  bool: {
    should: [
      {match: {username: text}},
      {
        nested: {
          path: 'names',
          query: {
            multi_match: {
              query: text,
              type: 'cross_fields',
              fields: [
                'names.first',
                'names.last',
              ],
            },
          },
        },
      },
      {
        nested: {
          path: 'addresses',
          query: {
            multi_match: {
              query: text,
              type: 'cross_fields',
              fields: [
                'addresses.city',
                'addresses.state',
              ],
            },
          },
        },
      },
    ],
  },
},

Careful with blanket statements like this otherwise a man called Francisco may suddenly appear in a lot of results.

Ideally you need to consider on a per-query basis what is the most likely context for each word provided.
This is what cross_fields will try do for you because otherwise (as you may have already discovered) default Lucene ranking favours rare terms and therefore the most unlikely context (name.first:Francisco Vs addresses.city:Francisco).
However, given you have separate clauses for address and person names the cross_fields logic will not figure out which is the most likely choice between the two contexts.
If you were to use copy_to to root-level person and address index fields you could use a single cross_fields multi-match across these fields and it would determine the most appropriate scoring context for each search term. Or alternatively you could just blend all of the terms into a single "all" type field using copy_to and use a regular single-field match to avoid the oddities of right-word-wrong-context matching that Lucene suffers from.

thanks for advice. It makes a lot of sense.

I still wonder if there Is a way to boost any should clause without using cross_field on root level or copy_to.
Because by copying to root level, you lose relationships between properties of a nested objects.

It's not always a problem. In policing systems a criminal may use a number of aliases and matching any jumbled combination of previous aliases is a feature not a bug but obviously there are other scenarios where this is not desirable.

Yes it can be a feature, and I agree. However, sometimes your customers or clients may have their own requirements. I guess there is currently no way to boost one should clause with simple boost parameter.

I still think there is a dirty way of mimicking it by copying the same should clause over multiple times to have the same boost effect even though I am afraid it will slow down the search since it's doing the same work multiple times.

The sad truth is users (including myself) are lazy and want to just type words into a single search box and get magic.
However these are bad queries because there is typically much that is left unstated -

  • Does word1 relate to field X or field Y? Francisco the name or the town?
  • Do word1 and word2 even relate to the same nested person or are they about different people?

More structured form-based UIs or "did you mean?" type clarifications will help answer the above questions but without this added help from the user it is hard to make a clever solution. Perhaps indexing with shingles will help contextualize input better e.g. the meaning of diego changes if you consider a search for the shingle san diego Vs diego maradona. Surrounding words influence the context.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.