[Highlighting] Performance & Simplicity

I have a scenario where I am forced to use a strict query filter, to filter my results, without a query block, and therefore cannot use highlighting without a highlight_query as there is no query block in my main query.

My query is as follows:
{
query: {
filtered {
filter: {
bool: {
must: {
multi_match: {
query: 'Samantha',
fields: [ 'firstName', 'middleName', 'lastName']
}
}
}
}
}
}
}

I've experienced that straightforward highlighting won't work, as there is no query block; only a filter block, so highlighting doesn't know what to highlight. I assume the following should not work:

	highlight: {
		fields: {
			firstName: {},
			middleName: {},
			lastName: {}
		}
	}

What I did instead is use the highlight_query. My only concerns with this is repeating the same query for each field, which seems redundant and unneccesary to do. This might blow up my query if I have a lot of fields. In addition, my concern with performance is that it will be running the same query for each field. This is what I have:

highlight: {
	fields: {
		firstName: {
			highlight_query: {
				bool: {
					must: {
						match: {
							firstName: 'Samantha'
						}
					}
				}
			}
		},
		middleName: {
			highlight_query: {
				bool: {
					must: {
						match: {
							middleName: 'Samantha'
						}
					}
				}
			}
		},
		lastName: {
			highlight_query: {
				bool: {
					must: {
						match: {
							lastName: 'Samantha'
						}
					}
				}
			}
		}
	}
}

Could anyone answer the following questions?

  1. Is there a more generic way of saying that this is my query and these are my fields, other than repeating the query for each field?
  2. Is there a performance concern by repeating the queries for each field, will the same query be ran for each field? Is this normal behaviour even in a "multi-match"?
  3. Is there a concern with my request being too big if I have 100 fields, and 100 query blocks for each field?

Thanks.

Generally, filters are structured choices that people make e.g. date or price ranges, brand names.

  • They don't normally expect highlighting and they are common to many searches (e.g. many user wanting to share the same cached date filter for "today").
  • They don't use scoring to make one match better than another

Things that get highlighted are normally free-text unstructured searches expressed as queries because they aren't filters common to many users and they require scoring for relevance ranking e.g. multi-match with cross_fields mode will help score a match on firstName:Samantha higher than lastName:Samantha

So the question is perhaps why is "Samantha" a filter not a query in the first place?

That makes sense. I believe it's a filter because we've been following this structure in our app for a long time that we probably need to change. Long story short, highlighting is not meant for filters, and using a highlight query to do so will probably result in unexpected and undesired behaviour.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.