Execute a filter on nested document only if it exists


(pungent) #1

I am using ES 2.3 and have a query in which filter section looks as follows:

"filter": {
    "query": {
      "bool": {
        "must": [
          {
            "nested": {
              "path": "employees",
              "query": {
                "bool": {
                  "must": [ 
                    {
                      "range": {
                        "employees.max_age": {
                          "lte": 50
                        }
                      }
                    }, 
                    {
                      "range": {
                        "employees.min_age": {
                          "gte": 20
                        }
                      }
                    }
                  ]
                }
              }
            }
          }, 
          {
            "exists": {
              "field": "employees"
            }
          },
          {
            #....other filter here based on root document, not on nested employee document
          }
        ]
      }
    }
  }
}

I have a filter, where I check some conditions in the nested document "employees" in a bigger document called company, But I want to run this filter, only if "employees" object exists, as some of the document may not have that nested document at all. So I added , {"exists": {"field": "employees"}}
but this doesn't seem to work. Any idea what change I should make to get it work?


(Mark Harwood) #2

In what way is it broken?
False positives or false negatives? Either way I don't think you need the "exists" clause here.

The data model seems unusual here. If I read it right you have a company doc with a nested array of employee objects each of which seem to have a min_age and a max_age value rather than recording an actual age of the employee or a birthdate?
FYI if that is the case and age ranges are what you record then this new feature coming in 5.2 may be of interest: https://www.elastic.co/blog/numeric-and-date-ranges-in-elasticsearch-just-another-brick-in-the-wall


(pungent) #3

@Mark_Harwood thanks for the suggestion. Regarding "data model seems unusual" - I have intentionally did not put my whole data model and actual field names here. So let's ignore that part.

What I am looking for is - if company document doesn't have employee document, then return that company document, but if that document has employee nested document, then run the filter. Right now the issue is if a document doesn't have employees nested document, then that document do not get returned at all, because it tries to run the filter on a non-existing employees document. So what change I should make in order to ignore filter, if employees does not exists. It is something like this if employee.exists { run filter } else { return company doc} .. make sense?


(Mark Harwood) #4

OK I think I got it.
You need a top level OR - so empty or, has employees with what you want. That needs a bool with a should clause. Try this:

DELETE test
PUT test
{
   "settings": {
	  "index": {
		 "number_of_shards": 1
	  }    
   },
   "mappings": {
	  "company": {
		 "properties": {
			"name": {
			   "type": "text"
			},
			"employees":{
				"type":"nested",
				"properties":{
					"age":{
						"type":"integer"
					}
				}
			}
		 }
	  }
   }
}
POST test/company/1
{
	"name":"no employees"
}
POST test/company/2
{
	"name":"Some employees",
	"employees":[
		{"age":20}
	]
}
GET test/company/_search
{
   "query": {
	  "bool": {
		 "should": [
			{
			   "bool": {
				  "must_not": [
					 {
						"nested": {
						   "path": "employees",
						   "query": {
							  "exists": {
								 "field": "employees.age"
							  }
						   }
						}
					 }
				  ]
			   }
			},
			{
			   "nested": {
				  "path": "employees",
				  "query": {
					 "match": {
						"employees.age": 20
					 }
				  }
			   }
			}
		 ]
	  }
   }
}

(pungent) #5

@Mark_Harwood First of all, I appreciate your time to write up the solution end to end with an example. You made my day. Indeed your solution works, but you missed one key point. As I mentioned in my question that I would also have other filters on root document they would be must.

{ #....other filter here based on root document, not on nested employee document }

So suppose in your example, I added few more documents as follows:

POST test/company/3
{
	"name":"Some employees",
	"employees":[
		{"age":40}
	]
}

POST test/company/4
{
	"name":"Some employees",
	"employees":[
		{"age":30}
	]
}

I am interested only to grab documents where company name condition matches i.e. company.name == "Some employees". Since You are using should on the top level, then it won't be possible. Because if I add this condition in the query, then it will also pull document where company name == "no employees". Got complex :slight_smile:


(pungent) #6

@Mark_Harwood

If you map to a sql analogy so this is what it would be select * from company where (company.employee.age > 10 or company.employee = null) and company.name="some employee"


(Mark Harwood) #7

No problem - glad to be of help.

OK - so you just nest my example bool query under a new top-level bool inside a must clause along with the mandatory name criteria so:

GET test/company/_search
{
   "query": {
	  "bool": {
		 "must": [
			 {
				 "match":{
					 "name":"employees"
				 }
			 },
			{
			   "bool": {
				  "should": [
					 {
						"bool": {
						   "must_not": [
							  {
								 "nested": {
									"path": "employees",
									"query": {
									   "exists": {
										  "field": "employees.age"
									   }
									}
								 }
							  }
						   ]
						}
					 },
					 {
						"nested": {
						   "path": "employees",
						   "query": {
							  "match": {
								 "employees.age": 20
							  }
						   }
						}
					 }
				  ]
			   }
			}
		 ]
	  }
   }
}

(pungent) #8

Perfect, it worked :slight_smile: Thank you so much @Mark_Harwood .


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.