How to execute Must and Should query that matches an individual Nested field?

A record has two items/values indexed in a nested field type addresses.fulladdress.

"11 High Street" and "88 West Avenue"

If a user searched for "11 West" this record should not be matched. How should we structure the query so it only matches where "11" and "West" both appear in a single "fulladdress" value please?

The following query matches any instance of each of the search terms, not where they both appear in a single addresses.fulladdress field. :expressionless:

List<string> searchTerms = new List<string>(){ "11", "West" };
qcd.Nested(c => c.Name("address_query")
     .Boost(1.1)
     .InnerHits(i => i.Explain())
     .Path(p => p.addresses)
     .Query(nq => nq
     .Terms(t => t.Field(f => f.addresses.First().fulladdress).Terms(searchTerms))));

We also have records with person names stored as another nested data type aliases.fullname.
A user wants to find records where one of the alias.fullname properties contains one of the following values derived from them submitting a search for Forename(s): "Robert Andrew" & Surname(s): "SMITH JONES"...

  • Robert SMITH
  • Andrew SMITH
  • Robert Andrew SMITH
  • Robert JONES
  • Andrew JONES
  • Robert Andrew JONES

So one of the alias.fullname fields MUST contain one of the surname values and one or more of the forename values.
When querying against simple separate forename and surname text fields this is easily accomplished with a bool query that has "filter" for the surname .Match query and "should" with MinimumShouldMatch(1) for the forename(s).
How do you achieve the desired result against a nested collection of full names please?
As with the address example, we would not want cross match, so if the record had two alias.fullname values "Robert DAVIS" and "David JONES" we would not want a hit due to "Robert JONES" where it matches surname in one alias.fullname and forename in another.

If you are not familiar with the C# NEST query syntax but could offer advice on how to achieve either of these queries with JSON (Kibana Console) syntax please do share your wisdom.

Thanks in advance of your kind response. :slightly_smiling_face: :slightly_smiling_face: :slightly_smiling_face:

This is an old slide but might help paint the picture.

A single JSON document has nested elements unpacked into adjacent Lucene documents.
The "nested" query essentially acts as a translation point. Whatever criteria you put underneath (including bool queries) matches an inner Lucene document but the nested clause translates any reported matches as a hit on the Lucene document that contains it. The "nested" clause simply establishes a new context for the criteria it contains.

1 Like

Thanks for the reply Mark.

In a relational SQL database the BooleanQuery in the diagram might be...

SELECT * FROM resume
INNER JOIN qualifications ON resume.personID = qualifications.personID
WHERE qualifications.Subject='Maths' AND Grade='A1'

Individual single term exact matches against two column values.

We need the equivalent of

WHERE (aliases.fullname LIKE '%SMITH%' AND (aliases.fullname LIKE '%Robert%' OR aliases.fullname LIKE '%Andrew%'))
OR (aliases.fullname LIKE '%JONES%' AND (aliases.fullname LIKE '%Robert%' OR aliases.fullname LIKE '%Andrew%'))

How do we stop the Elastic query from matching a term in any of the adjacent (nested) documents?

That's the point of the nested clause - it ensures the contained query executes in the context of matching just one child element.
Let's focus then on the query logic you need to put underneath the nested element. If I understand It's more of a question about how this

 (SMITH AND (Robert OR Andrew)) OR (JONES AND (Robert OR Andrew))

Can be expressed in a Bool query. I think we can start by refactoring to the simpler:

  (Robert OR Andrew) AND (SMITH OR JONES)

Which would be

   bool
        must
            bool
                should
                        Robert
                        Andrew
            bool
                should
                        Smith
                        Jones

A Boolean with only "should" clauses essentially acts as an OR.
So take the above and put under your nested clause.

1 Like

Thanks Mark.

I am using the following, which is almost what I desire.

List<QueryContainer> filterQCs = new List<QueryContainer>();
List<QueryContainer> shouldQCs = new List<QueryContainer>();

var surnamesList = PhraseSplit(surnames); // -- Split surname by spaces, except where enclosed in quotes.
var forenamesList = PhraseSplit(forenames);
string analyser = (useSynonyms) ? "synonym" : "standard";

if (surnamesList.Count > 0) {
    if (surnamesList.Any(x => x.Contains(" "))) {
        BoolQuery bqSurname = new BoolQuery();
        bqSurname.MinimumShouldMatch = 1; // -- Only need one to match
        List<QueryContainer> shouldSurnameQCs = new List<QueryContainer>();
        foreach (var surname in surnamesList) {
            var qcdSurname = new QueryContainerDescriptor<MyRecord>();
            if (surname.Contains(" ")) {
                qcdSurname.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.MatchPhrase(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).Query(surname))));
            }
            else {
                qcdSurname.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.Match(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).Query(surname))));
            }
            shouldSurnameQCs.Add(qcdSurname);
        }
        bqSurname.Should = shouldSurnameQCs;
        filterQCs.Add(bqSurname);
    }
    else {
        var qcdSurnames = new QueryContainerDescriptor<MyRecord>();
        qcdSurnames.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.Match(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).MinimumShouldMatch(1).Operator(Operator.Or).Query(surnames))));
        filterQCs.Add(qcdSurnames);
    }
}

if (forenamesList.Count > 0) {
    if (forenamesList.Any(x => x.Contains(" "))) {
        BoolQuery bqForename = new BoolQuery();
        bqForename.MinimumShouldMatch = 1; // -- Only need one to match
        List<QueryContainer> shouldForenameQCs = new List<QueryContainer>();
        foreach (var forename in forenamesList) {
            var qcdForename = new QueryContainerDescriptor<MyRecord>();
            if (forename.Contains(" ")) {
                qcdForename.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.MatchPhrase(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).Query(forename))));
            }
            else {
                qcdForename.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.Match(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).Query(forename))));
            }
            shouldForenameQCs.Add(qcdForename);
        }
        bqForename.Should = shouldForenameQCs;
        filterQCs.Add(bqForename);
    }
    else {
        var qcdForenames = new QueryContainerDescriptor<MyRecord>();
        qcdForenames.Nested(q => q.Path(p => p.fullnames).Query(qn => qn.Match(t => t.Field(f => f.fullnames.First().fullname).Analyzer(analyser).MinimumShouldMatch(1).Operator(Operator.Or).Query(forenames))));
        filterQCs.Add(qcdForenames);
    }
}

BoolQuery boolQuery = new BoolQuery();
boolQuery.Filter = filterQCs;
if (shouldQCs.Count > 0) {
    boolQuery.Should = shouldQCs;
}
sDescriptor.Query(q => boolQuery);

Where a document has two fullname fields in the nested "fullnames" of Peter Rabbit and Mickey Mouse my query would return a match if a user searched for Surname(s) = Mouse and Forename(s) = Peter.
Ideally it should only match records where the combination of Surname(s) and Forename(s) appear within a single document fullname nested field.

I think I need to see JSON not the code that builds it.

The Nest C# library luckily provides a way to review the JSON request... pasted below. :slightly_smiling_face:

{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "minimum_should_match": 1,
            "should": [
              { "match": { "systemId": { "query": "40" } } },
              { "match": { "systemId": { "query": "41" } } }
            ]
          }
        },
        {
          "nested": {
            "path": "fullnames",
            "query": {
              "match": {
                "fullnames.fullname": {
                  "analyzer": "synonym",
                  "minimum_should_match": 1,
                  "operator": "or",
                  "query": "Smith"
                }
              }
            }
          }
        },
        {
          "nested": {
            "path": "fullnames",
            "query": {
              "match": {
                "fullnames.fullname": {
                  "analyzer": "synonym",
                  "minimum_should_match": 1,
                  "operator": "or",
                  "query": "Richard"
                }
              }
            }
          }
        }
      ]
    }
  },
  "size": 50,
  "sort": [ { "relevantCreatedDateTime": { "order": "desc" } } ]
}

That's what we call "cross-matching". You need one nested query with a single Boolean query that has the 2 must clauses for first name and last name. Like the lower BooleanQuery in my diagram

Thanks Mark.
I will try and implement the "must bool should" you suggested.
Is the should a Match or Terms query please?
Am I right in thinking I can pass an array of forename(s) and array of surname(s) using Terms which the Elastic engine converts to individual Match queries? So I could stick with my "foreach" and explicitly add Match queries for each surname and each forename, provided I bundle them in to a single bool must query?

term and terms queries don't get fed through and Analyzer but match queries do.

Broadly speaking match queries are for what users type (mixed case etc that needs treating) while term queries might come from what people click e.g. top ten results discovered by an aggregation.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.