Can I label which entry of a nested property triggered a match in a nested query?

Let's say you were trying to model a "user" and decided that the following schema would be good enough for the job:

  • first, the user's first name
  • last, the user's last name
  • custom, an array of of custom fields (i.e. objects with two properties: name and value)

With this in mind, you might end up creating an index as follows:

PUT /test-index-20221125-nrveqscnji
{
  "mappings": {
    "properties": {
      "first": { "type": "text" },
      "last": { "type": "text" },
      "custom": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "value": {
            "type": "text"
          }
        }
      }
    }
  }
}

Let's add a couple of users to the index. Bill Gates, living in Seattle, and founder of the Bill & Melinda Gates foundation:

PUT /test-index-20221125-nrveqscnji/_doc/1
{
  "first": "Bill",
  "last": "Gates",
  "custom": [
    {
      "name": "location",
      "value": "Seattle"
    },
    {
      "name": "bio",
      "value": "Founder of Bill & Melinda Gates foundation"
    }
  ]
}

Ivo Jurek, CEO of Gates Corporation:

PUT /test-index-20221125-nrveqscnji/_doc/2
{
  "first": "Ivo",
  "last": "Jurek",
  "custom": [
    {
      "name": "title",
      "value": "CEO of Gates Corporation"
    }
  ]
}

Now, let's say we wanted to search all the documents matching gates; one could implement that with the following query:

GET /test-index-20221125-nrveqscnji/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "first": {
              "query": "gates"
            }
          }
        },
        {
          "match": {
            "last": {
              "query": "gates"
            }
          }
        },
        {
          "nested": {
            "path": "custom",
            "query": {
              "match_phrase_prefix": {
                "custom.value": {
                  "query": "gates"
                }
              }
            }
          }
        }
      ]
    }
  }
}

That will work just fine; but what if we wanted to know why a specific document matched? Named queries can partially help:

GET /test-index-20221125-nrveqscnji/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "first": {
              "query": "gates",
              "_name": "first"
            }
          }
        },
        {
          "match": {
            "last": {
              "query": "gates",
              "_name": "last"
            }
          }
        },
        {
          "nested": {
            "path": "custom",
            "query": {
              "match_phrase_prefix": {
                "custom.value": {
                  "query": "gates"
                }
              }
            },
            "_name": "custom"
          }
        }
      ]
    }
  }
}

With the above, we will indeed be able to understand if a match happened on the first name, on the last name, or on any of the custom fields; but what if (I promise this is the last time) I wanted to know which of the custom fields (i.e. which entry of the nested property) triggered a match? Is it possible to do that, or do I have to post-process the query result to achieve that?

A possible solution would be to write many nested queries, one per custom field, and each with a different _named value:

GET /test-index-20221125-nrveqscnji/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "first": {
              "query": "gates",
              "_name": "first"
            }
          }
        },
        {
          "match": {
            "last": {
              "query": "gates",
              "_name": "last"
            }
          }
        },
        {
          "nested": {
            "path": "custom",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase_prefix": {
                      "custom.value": {
                        "query": "gates"
                      }
                    }
                  },
                  {
                    "match": {
                      "custom.name": "location"
                    }
                  }
                ]
              }
            },
            "_name": "custom_location"
          }
        },
        {
          "nested": {
            "path": "custom",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase_prefix": {
                      "custom.value": {
                        "query": "gates"
                      }
                    }
                  },
                  {
                    "match": {
                      "custom.name": "bio"
                    }
                  }
                ]
              }
            },
            "_name": "custom_bio"
          }
        },
        {
          "nested": {
            "path": "custom",
            "query": {
              "bool": {
                "must": [
                  {
                    "match_phrase_prefix": {
                      "custom.value": {
                        "query": "gates"
                      }
                    }
                  },
                  {
                    "match": {
                      "custom.name": "title"
                    }
                  }
                ]
              }
            },
            "_name": "custom_title"
          }
        }
      ]
    }
  }
}

But what if (I know, I lied), you don't know the full list of custom fields upfront; or, even worse, the list is kind of big (in the 100-1000 range)?

Thanks in advance,
M.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.