Ignore common terms in field-based query

Let's say I have a search that matches two inputs (first name and last name) against first name and last name fields in my index.

My index contains very clean data - good first names, good last names.

However, the data entry driving the search query has poor quality data. Often the first name may actually contain a prefix such as "Mrs." instead of a first name like "Jessica".

So I may encounter queries such as "First=Mrs, Last=Robinson".

I'd like to define a list of words such as "Mrs, Mr, Ms, Dr, Prof" where if those are given as the First Name to search for, I'd want my search to behave as if they passed null/blank for first name instead, and focus on searching for Last name exclusively. However if a good first name (ie not one of those words) is given, I'd want the search to treat first name and last name equally in weight.

Is this possible to do?

What I've tried:

  • stop words / token filter: most advice I've found as in respect to the index / data ingestion phase, rather than cleaning up data inside a query.
  • script template + conditional clauses: This seems close, but it seems like I can only consider a field's presence in a query rather than react to a specific set of values in that field when present

How are you running the query currently? When a term is not found in a field, it doesn't contribute to the score so having tokens that never occur shouldn't be a big deal? Unless you are requiring that both the first name and the last name match?

Have you looked into configuring a search_analyzer on your first_name field that configures stop words?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.