Bool query in array field

Hi,
I have a very particular issue concerning querying over a boolean field and a string field which are nested to an array field. The index mapping is as follow:

indexes :string_field_1, type: 'string'
indexes :string_field_2, type: 'string'
indexes :boolean_field_1, type: 'boolean'
indexes :array_field_1 do
           indexes :boolean_field_2, type: 'boolean'
           indexes :string_field_3, type: 'string'
end
indexes :array_field_2 do
           indexes :integer_field_1, type: 'integer'
end
indexes :array_field_3 do
           indexes :integer_field_2, type: 'integer'
end

The document index also has many other fields which are not nested to the array field, but have to be included among the query fields.
I have tried an approach using filter and bool queries that is as follow:

"query":
        {"bool":
                {"must":
                        [
                                {"query_string":
                                        {"query":"text which is being searched",
                                        "fields":[
                                                "string_field_1",
                                                "string_field_2",
                                                "array_field_1.string_field_3"
                                                ],
                                        "fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"}
                                }
                        ],
                        "filter":[
                                {"bool":
                                        {"must":
                                                [
                                                        {"bool":
                                                                {"should":
                                                                        [
                                                                                {"term":{"boolean_field_1":false}},
                                                                                {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                                                                {"term":{"array_field_3.integer_field_2":y}}]}},
                                                        {"bool":
                                                                {"should":
                                                                        [
                                                                                {"term":{"array_field_1.boolean_field_2":true}},
                                                                                {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                                                                {"term":{"array_field_3.integer_field_2":y}}]}},
                                                                        ]
                                                                }
                                                        }
                                                ]
                                        }
                                }
                        ]
                }
}

The problem with this query is that it is returning a document which, in my opinion, doesn't have to be returned.
The document, in this case, is the bellow:

_source": {
	"string_field_1": "text 1",
	"string_field_2": "text 2",
	"boolean_field_1": false, 
	"array_field_1": [
		{
			"boolean_field_2": true,
			"string_field_3": "some text which is not being searched"
		},
		{
			"boolean_field_2": true,
			"string_field_3": "some text which is not being searched"
		},
		{
			"boolean_field_2": false,
			"string_field_3": "text which is being searched"
		},
		{
			"boolean_field_2": true,
			"string_field_3": "some text which is not being searched"
		}
	],
	"array_field_2": [
		{
			"integer_field_1": A
		}
	],
	"array_field_3": [
		{
			"integer_field_2": B
		}
	]
}

As you can notice, the third item of array_field_1 contains boolean_field_2: false and also the text which is being searched. But, according to my filter: clause, only the documents which array_field_1.boolean_field_2 is true have to be retrieved, unless array_field_2.integer_field_1: or array_field_3.integer_field_1 occurs, which is not true, according to my query part.
It seems elastic is not considering that the array_field_1[2] is the one that the boolean_field_2 is false.
How can I make my query so that this document isn't retrieved?
Thanks is advance,
Guilherme

Another approach consists of putting the array_field_1.string_field_3 query together with the bool query related to the boolean field:

"query":{
	"bool":{
		"should":
		[
			{
				"query_string":
					{
						"query":"text which is being searched",
						"fields":
							[
								"string_field_1",
                        		"string_field_2"
                        	],
                        	"fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
                    }
            },
            {
            	"bool":{
            		"must":
            		[
            			{
            				"query_string":
            				{
            					"query":"text which is being searched",
            					"fields":["array_field_1.string_field_3"],
            					"fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
            				}
            			},
            			{
            				"bool":{
            					"should":
                                [
                                    {"term":{"array_field_1.boolean_field_2":true}},
                                    {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                    {"term":{"array_field_3.integer_field_2":y}}
                                ]
							}
						}
					]
				}
			}
		],
		"filter":
		[
			{
				"bool":{
					"should":
					[
                    	{"term":{"boolean_field_1":false}},
                        {"terms":{"array_field_2.integer_field_1":[x,z]}},
                        {"term":{"array_field_3.integer_field_2":y}}
                    ]
				}
			}
		]
	}
}

This query also retrieves the document, unfortunately. I really do not know how to build this query properly.

The query above is organized as:
(X) OR (A AND (B OR C OR D))

Thanks in advance,
Guilherme

You probably want to have a look at nested fields: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/nested.html

Thanks @jpountz, I'll check it out .

@jpountz, I have changed my query so that the second "must" clause includes the "nested" parameter. I also changed the index mapping to define type: 'nested' to the array_field_1 (I also reindex the data).

The index mapping was updated to:

indexes :array_field_1, type: 'nested' do
           indexes :boolean_field_2, type: 'boolean'
           indexes :string_field_3, type: 'string'
end

The query was updated to:

"query":{
	"bool":{
		"should":
		[
			{
				"query_string":
					{
						"query":"text which is being searched",
						"fields":
							[
								"string_field_1",
                        		"string_field_2"
                        	],
                        	"fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
                    }
            },
            {
                 nested: {
                 path: 'array_field_1',
                 query: {
            	"bool":{
            		"must":
            		[
            			{
            				"query_string":
            				{
            					"query":"text which is being searched",
            					"fields":["array_field_1.string_field_3"],
            					"fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
            				}
            			},
            			{
            				"bool":{
            					"should":
                                [
                                    {"term":{"array_field_1.boolean_field_2":true}},
                                    {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                    {"term":{"array_field_3.integer_field_2":y}}
                                ]
							}
						}
					]
				}
                           }
                         }
			}
		],
		"filter":
		[
			{
				"bool":{
					"should":
					[
                    	{"term":{"boolean_field_1":false}},
                        {"terms":{"array_field_2.integer_field_1":[x,z]}},
                        {"term":{"array_field_3.integer_field_2":y}}
                    ]
				}
			}
		]
	}
}

But, unfortunately, the document is still being retrieved. Do I also have to map array_field_2 and array_field_3 with type: 'nested'? Do they have to be inside a 'nested' parameter too? I did not make that way because I am not querying for more than one field for each of this array fields.

Do you have any idea what have to be changed?

Thanks in advance,
Guilherme

As you can notice, the should clause contains two sub-clauses related to array objects which are not nested to array_field_1. Do I have to use reverse_nested: here? The problem is that it seems reverse_nested: parameter are allowed only in aggregation part, isn't it?

To avoid the nested issue related to the bool should clause concerning the array_field_2 and array_field_3 fields, I divided the second should sub-query so that I can work with the arrays separately. The query was updated to. I also removed the filter clause, just to test the bool queries:

"query":{
	"bool":{
		"should":
		[
			{
				"query_string":
					{
						"query":"text which is being searched",
						"fields":
							[
								"string_field_1",
                        		                       "string_field_2"
                        	],
                        	"fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
                    }
            },
            {
                 bool: {
                                   should:[
                                       {
                                           query:{
                                               nested: {
                                                   path: 'array_field_1',
                                                   query: {
                                                       bool: {
                                                           must: [
                                                               { match: { "array_field_1.string_field_3": "text which is being searched"} },
                                                               {term: {"array_field_1.boolean_field_2": true}}
                                                           ]
                                                       }
                                                  }
                                              }
                                          }
                                       },
                                       {
                                          bool:
                                          {
                                            must: [
                                                { match: { "array_field_1.string_field_3": "text which is being searched"} },
                                                bool:{
                                                    should: [
                                                       {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                                       {"term":{"array_field_3.integer_field_2":y}}
                                                    ]
                                                }
                                            ]
                                          }
                                       }
                                   ]
                               }
	    }
	]
	}
}

But, unfortunately, it still didn't work. I'm tired....

Thanks in advance, Guilherme

Why did you replace the two must/filter clauses with two should clauses in your latest comments?

That's exactly what I've done! And it worked! I couldn't post it yet, but tomorrow I'll do it.

Thanks a lot, @jpountz! The "nested" tip was very helpful!

That was my solution:

"query":{
    "bool":{
        "should":
        [
            {
                "query_string":
                    {
                        "query":"text which is being searched",
                        "fields":
                            [
                                "string_field_1",
                                                       "string_field_2"
                            ],
                            "fuzziness":"1","analyze_wildcard":true,"auto_generate_phrase_queries":false,"analyzer":"brazilian","default_operator":"AND"
                    }
            },
            {
                 bool: {
                                   should:[
                                       {
                                           query:{
                                               nested: {
                                                   path: 'array_field_1',
                                                   query: {
                                                       bool: {
                                                           must: [
                                                               { match: { "array_field_1.string_field_3": "text which is being searched"} },
                                                               {term: {"array_field_1.boolean_field_2": true}}
                                                           ]
                                                       }
                                                  }
                                              }
                                          }
                                       },
                                       {
                                          bool:
                                          {
                                            must: [
                                             {
                                                     query:{
                                                         nested: {
                                                             path: 'movimentos',
                                                             query: {
                                                                 bool: {
                                                                     must: [
                                                                         { match: { "array_field_1.string_field_3": "text which is being searched"} },
                                                                         {term: {"array_field_1.boolean_field_2": false
                                                                     ]
                                                                 }
                                                             }
                                                         }
                                                     }
                                                },
                                                {
                                                  query: {
                                                    bool: {
                                                            should: [
                                                              {"terms":{"array_field_2.integer_field_1":[x,z]}},
                                                              {"term":{"array_field_3.integer_field_2":y}}
                                                            ]
                                                        }
                                                      }
                                                }
                                              ]
                                          }
                                       }
                                   ]
                               }
        }
    ]
    }
}

Thank you very much,

Guilherme

1 Like

@jpountz, after changing my :array_field_1 to type: 'nested', I am now facing a problem with the aggregation clause. The :array_field_1 has an object field :object_1 and there is an aggregation related to it.

indexes :array_field_1, type: 'nested' do
   indexes :string_field_3, type: 'string'
   indexes :object_1 do
     indexes :string_field_raw, type: 'string', index: 'not_analyzed'
   end
end

The aggregation, before I changed :array_field_1 was defined as:

object_1: {
                  filter: {}, aggs: {
                      object_1_range: { terms: { field: 'array_field_1.object_1.string_field_raw' }} }
              }

Now, it was changed to, based on Elastic Reference (https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-aggregations-bucket-nested-aggregation.html):

object_1: {
                  nested: { path: 'array_field_1'},
                  aggs: {
                      object_1_range: { terms: { field: 'array_field_1.object_1.string_field_raw' }} }
              }

The aggregation is working, as the buckets are being filled properly. But, when the term is selected, the querying isn't retrieving anything. The query, after aggregation term selection, is as follow:

{match: 
{"array_field_1.object_1.string_field_raw"=>{query: "Selected aggregation term", boost: "1.4", operator: "AND"}}}]
}
}

This match clause is included in the outer must clause of the query:

That was the final query after aggregation selection:

"query: {
"bool: {
"must": [
{
   First_must_clause_mentioned_before 
},
{match: 
    {"array_field_1.object_1.string_field_raw"=>{query: "Selected aggregation term", boost: "1.4", operator: "AND"}}}]
    }
    }
]
}
}

How can I query for another term related to array_field_1 that is not inside the "nested" clause? For example, if I have an array:
[ item 1], [item 2]
and I want to be returned the documents which have both itens in :array_field_1?
Is it possible, as I am already querying with nested clause before?
Do I need to change my aggregation clause?

Thank in advance,

Guilherme