Range query has weird results

Observed what I would consider to be "weird" results from a query. I understand that Elasticsearch is an inverted index concept, but I would expect different behavior.

Given a set of documents that have calendar dates that denotes when someone is 'not available' in a nested object

"nonAvailability": [
                    {
                        "date": "2018-08-01"
                    },
                    {
                        "date": "2018-08-02"
                    }
                ]

I would expect a range query like below to return documents where there is not any nonAvailability as well as those that do not fall in the range. However, this query only returns documents that have nonAvailability values. Any documents with '[]' values (unlimited availability) do not return.

"query": {
      	"nested" : {
      		"path" : "nonAvailability",
      		"query" :  {
      			"bool" : {
      				"must_not" : [{
      					"range" : {
      						"nonAvailability.date" : {
      							"gte" : "2018-07-01",
      							"lt" : "2018-07-03",
      							"relation" : "within"
      						}
      					}
      				}]
      			}
      		}
      	}
      }

Is there another parameter I can pass to ensure that all documents are considered not just ones that have some nonAvailability already maintained?

Your reasoning makes sense, and this how it will work if nonAvailability were not nested field.

For the nested field and nested query we have a different algorithm:

  1. During indexing for every nested object inside an array, we create a separate internal Lucene document.
  2. During nested query, we find matches among these created documents. We then find a parent document for the nested matches and return parents as hits.

If your array is empty, no nested documents will be created. Hence, no matches will be found for your nested query.

Hello!

Thanks for your response.

Logically I understand why this works the way it does based on your explanation.

However, since the query is using “must_not” as the operator, if no nested docs are found as a part of the query, wouldn’t you expect this to evaluate with a truthy evaluation?

Meaning, if Doc A has some nested values and Doc B has an empty object for that key, a “must_not” query should return a truthy evaluation if Doc A has ranges not matching the query and Doc B has no values at all (as far as someone looking at the data is concerned, without knowledge of what’s happening behind the scenes concerning your explanation above).

What you said would make perfect sense for a “must” bool query, but I do not think is consistent for a “must_not” query.

To “solve” this, if we have no dates to add, we add a date 3 years in the future, and then the query finds all the correct results.

Seems really wonky.

Thanks for your detailed explanation. It makes sense.
I have created an elasticsearch issue for that: https://github.com/elastic/elasticsearch/issues/34522
We will investigate it.

@titani0us

We have come up with 2 ways how you can address your problem depending on your expectations regarding parents with mixed children (that satisfy and don't satisfy must_not criterion).
First reversing nested and must. This will return parents without any children and parents that have children satisfying must_not criterion, but may miss some parents that have both children that satisfy and don't satisfy must_not criterion:

{
	"query": {
		"bool": {
			"must_not": [
				{
					"nested" : {
						"path" : "nonAvailability",
						"query": {
							"range" : {
    							"nonAvailability.date" : {
        							"gte" : "2018-07-01",
        							"lt" : "2018-07-03",
        							"relation" : "within"
    							}
							}
						}
					}
				}
			]
		}
	}
}

Second is to combine your original query with another bool query that will give us parents without children. The query will be more complex but it should return all necessary parents.

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must_not": {
              "nested": {
                "path": "nonAvailability",
                "query": {
                  "exists": {
                    "field": "nonAvailability.date"
                  }
                }
              }
            }
          }
        },
        {
          "nested": {
            "path": "nonAvailability",
            "query": {
              "bool": {
                "must_not": [
                  {
                    "range": {
                      "nonAvailability.date" : {
                        "gte" : "2018-07-01",
                        "lt" : "2018-07-03",
                        "relation" : "within"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.