Search through all document fields (nested and root document) in one multi match query

Let's assume these basic documents as an example:

{
  "name": "shirt",
  "description": "with stripes",
  "items": [
    {
      "color": "red",
      "size": "44"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

{
  "name": "shirt",
  "description": "with stripes",
  "items": [
    {
      "color": "green",
      "size": "40"
    }
  ]
}

{
  "name": "shirt",
  "description": "with dots",
  "items": [
    {
      "color": "green",
      "size": "38"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

What I want is to find the first document with a search term like pants stripes blue 38. All terms should be connected with AND as I'm not interested in pants with dots or other size and color combinations.

My mapping looks like this:

{
  "settings": {
    "index.queries.cache.enabled": true,
    "index.number_of_shards": 3,
    "index.number_of_replicas": 2,
    "analysis": {
      "filter": {
        "german_stop": {
          "type": "stop",
          "stopwords": "_german_"
        },
        "german_stemmer": {
          "type": "stemmer",
          "language": "light_german"
        },
        "synonym": {
          "type": "synonym_graph",
          "synonyms_path": "dictionaries/de/synonyms.txt",
          "updateable" : true
        }
      },
      "char_filter": {
        "multi_char_filter": {
          "type": "mapping",
          "mappings_path":"analysis/de/multi-char-replacement.txt"
        }
      },
      "analyzer": {
        "index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ],
          "char_filter": ["multi_char_filter"]
        },
        "search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "synonym",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ],
          "char_filter": ["multi_char_filter"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "description": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "items": {
        "type": "nested",
        "properties": {
          "color": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          },
          "size": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          }
        }
      }
    }
  }
}

Please ignore the fact that I'm using german stop words and such. I kept the example files above in english so that everyone can understand it but didn't adjust the mapping as the original example is in german.

So ideally what I want my query to look like is this:

{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "multi_match": {
          "query": "pants stripes blue 38",
          "fields": [
            "name",
            "description", 
            "items.color",
            "items.size"
          ],
          "type": "cross_fields",
          "operator": "and", 
          "auto_generate_synonyms_phrase_query": "false",
          "fuzzy_transpositions": "false"
        }
      }
    }
  }
}

And the Search Profiler from Kibana show that the query will be executed like this:

ToParentBlockJoinQuery (
+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)

Which looks to be exactly what I need in terms of AND and OR logic. Search through every attribute with every term and connect those results with AND.

But this query seems to only search inside the nested documents. In fact it seems like each query can only search through nested objects or the root document. If I remove the nested part the Search Profiler shows the difference:

{
  "query": {
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description",
        "items.color",
        "items.size"
      ],
      "type": "cross_fields",
      "operator": "and",
      "auto_generate_synonyms_phrase_query": "false",
      "fuzzy_transpositions": "false"
    }
  }
}

// Results in:

+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]

Both queries return zero results.

So my question is if there is a way to make the above query work and to be able to truly search across all defined fields (nested and root doc) within a multi match query on a term by term basis.

I would like to avoid doing any preprocessing to the search terms in order to split them up based on them being in a nested or root document as that has it's own set of challenges. But I do know that that is a solution to my problem.

Edit
The original files have a lot more attributes. The root document might have up to 250 fields and each nested document might add another 20-30 fields to it. Because the search terms need to search through lot of the fields (possibly not all, but most) any sort of concatenation of nested and root document attributes to make them "searchable" seems unpractical.

A flattened index might be a practical solution. By that I mean copying all root documents fields to the nested document and only indexing the nested docs. But in this question I would like to know if it also works with nested objects.

I think you might have to split the query in 2 parts that you will add to a should clause of a bool query:

  • One part will apply only to the parent data
// Parent part
{
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description"
      ]
    }
}

  • The other part will be applied to the nested data with a nested query
// Nested part
{
  "nested": {
    "path": "items",
    "query": {
      "multi_match": {
        "query": "pants stripes blue 38",
        "fields": [
          "items.color",
          "items.size"
        ]
      }
   }
}

May be this could help?

Hey David,

thanks for your reply!

I tried that query already (only change is a must for both queries) and found it to be working. But the results were not ideal.
The best result is definitely at the top of the list, but the query also returned documents which were less fitting.

I think everyone know a webshop or two where you search for something and only the first 2-3 results are actually what you want and the rest is stuff more or less relevant to your search terms. I would like to avoid that and therefore was hoping that there is a way to only return documents, where all attributes are found in and leave out the rest.

I also tried this setup with a similar but slightly more precise result:
Mapping

{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "description": {
        "type": "text"
      },
      "items": {
        "type": "flattened"
      }
    }
  }
}

Example document

POST flattened-test/_doc
{
  "name": "pants",
  "description": "with stripes",
  "items": [
    {
      "color": "red",
      "size": "44"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

Query

{
  "query": {
    "query_string": {
      "fields": [
        "description",
        "name",
        "items.color",
        "items.size"
      ],
      "query": "hose AND streifen AND 38 AND blue"
    }
  }
}

It seems like the flattened type not only flattenes the inner objects but also the whole array (items) of it. Do you know if this is true or not? I hoped that it only flattens each object in the array.
If the latter was achievable we could retain the information which size belongs to which color.

1 Like