Let's say you were trying to model a "user" and decided that the following schema would be good enough for the job:
-
first
, the user's first name -
last
, the user's last name -
custom
, an array of of custom fields (i.e. objects with two properties:name
andvalue
)
With this in mind, you might end up creating an index as follows:
PUT /test-index-20221125-nrveqscnji
{
"mappings": {
"properties": {
"first": { "type": "text" },
"last": { "type": "text" },
"custom": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "text"
}
}
}
}
}
}
Let's add a couple of users to the index. Bill Gates, living in Seattle, and founder of the Bill & Melinda Gates foundation:
PUT /test-index-20221125-nrveqscnji/_doc/1
{
"first": "Bill",
"last": "Gates",
"custom": [
{
"name": "location",
"value": "Seattle"
},
{
"name": "bio",
"value": "Founder of Bill & Melinda Gates foundation"
}
]
}
Ivo Jurek, CEO of Gates Corporation:
PUT /test-index-20221125-nrveqscnji/_doc/2
{
"first": "Ivo",
"last": "Jurek",
"custom": [
{
"name": "title",
"value": "CEO of Gates Corporation"
}
]
}
Now, let's say we wanted to search all the documents matching gates
; one could implement that with the following query:
GET /test-index-20221125-nrveqscnji/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"first": {
"query": "gates"
}
}
},
{
"match": {
"last": {
"query": "gates"
}
}
},
{
"nested": {
"path": "custom",
"query": {
"match_phrase_prefix": {
"custom.value": {
"query": "gates"
}
}
}
}
}
]
}
}
}
That will work just fine; but what if we wanted to know why a specific document matched? Named queries can partially help:
GET /test-index-20221125-nrveqscnji/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"first": {
"query": "gates",
"_name": "first"
}
}
},
{
"match": {
"last": {
"query": "gates",
"_name": "last"
}
}
},
{
"nested": {
"path": "custom",
"query": {
"match_phrase_prefix": {
"custom.value": {
"query": "gates"
}
}
},
"_name": "custom"
}
}
]
}
}
}
With the above, we will indeed be able to understand if a match happened on the first name, on the last name, or on any of the custom fields; but what if (I promise this is the last time) I wanted to know which of the custom fields (i.e. which entry of the nested property) triggered a match? Is it possible to do that, or do I have to post-process the query result to achieve that?
A possible solution would be to write many nested queries, one per custom field, and each with a different _named
value:
GET /test-index-20221125-nrveqscnji/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"first": {
"query": "gates",
"_name": "first"
}
}
},
{
"match": {
"last": {
"query": "gates",
"_name": "last"
}
}
},
{
"nested": {
"path": "custom",
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"custom.value": {
"query": "gates"
}
}
},
{
"match": {
"custom.name": "location"
}
}
]
}
},
"_name": "custom_location"
}
},
{
"nested": {
"path": "custom",
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"custom.value": {
"query": "gates"
}
}
},
{
"match": {
"custom.name": "bio"
}
}
]
}
},
"_name": "custom_bio"
}
},
{
"nested": {
"path": "custom",
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"custom.value": {
"query": "gates"
}
}
},
{
"match": {
"custom.name": "title"
}
}
]
}
},
"_name": "custom_title"
}
}
]
}
}
}
But what if (I know, I lied), you don't know the full list of custom fields upfront; or, even worse, the list is kind of big (in the 100-1000 range)?
Thanks in advance,
M.