Let's assume these basic documents as an example:
{
"name": "shirt",
"description": "with stripes",
"items": [
{
"color": "red",
"size": "44"
},
{
"color": "blue",
"size": "38"
}
]
}
{
"name": "shirt",
"description": "with stripes",
"items": [
{
"color": "green",
"size": "40"
}
]
}
{
"name": "shirt",
"description": "with dots",
"items": [
{
"color": "green",
"size": "38"
},
{
"color": "blue",
"size": "38"
}
]
}
What I want is to find the first document with a search term like pants stripes blue 38
. All terms should be connected with AND as I'm not interested in pants with dots or other size and color combinations.
My mapping looks like this:
{
"settings": {
"index.queries.cache.enabled": true,
"index.number_of_shards": 3,
"index.number_of_replicas": 2,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"synonym": {
"type": "synonym_graph",
"synonyms_path": "dictionaries/de/synonyms.txt",
"updateable" : true
}
},
"char_filter": {
"multi_char_filter": {
"type": "mapping",
"mappings_path":"analysis/de/multi-char-replacement.txt"
}
},
"analyzer": {
"index_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_normalization",
"german_stemmer"
],
"char_filter": ["multi_char_filter"]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym",
"german_stop",
"german_normalization",
"german_stemmer"
],
"char_filter": ["multi_char_filter"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"description": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"items": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"size": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
}
}
Please ignore the fact that I'm using german stop words and such. I kept the example files above in english so that everyone can understand it but didn't adjust the mapping as the original example is in german.
So ideally what I want my query to look like is this:
{
"query": {
"nested": {
"path": "items",
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
}
}
And the Search Profiler from Kibana show that the query will be executed like this:
ToParentBlockJoinQuery (
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)
Which looks to be exactly what I need in terms of AND and OR logic. Search through every attribute with every term and connect those results with AND.
But this query seems to only search inside the nested documents. In fact it seems like each query can only search through nested objects or the root document. If I remove the nested part the Search Profiler shows the difference:
{
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
// Results in:
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]
Both queries return zero results.
So my question is if there is a way to make the above query work and to be able to truly search across all defined fields (nested and root doc) within a multi match query on a term by term basis.
I would like to avoid doing any preprocessing to the search terms in order to split them up based on them being in a nested or root document as that has it's own set of challenges. But I do know that that is a solution to my problem.
Edit
The original files have a lot more attributes. The root document might have up to 250 fields and each nested document might add another 20-30 fields to it. Because the search terms need to search through lot of the fields (possibly not all, but most) any sort of concatenation of nested and root document attributes to make them "searchable" seems unpractical.
A flattened index might be a practical solution. By that I mean copying all root documents fields to the nested document and only indexing the nested docs. But in this question I would like to know if it also works with nested objects.