1.7-->2.3 migration: nestedFilter "join": false property?

Hello all!

I'm working on upgrading a project from ES v1.7 to ES 2.3. The project makes heavy use of nested filters, and when used in aggregates, the project sets the "join" property to false so that the matching nested docs are emitted. To refresh your memory about this property: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-nested-filter.html#_join_option

What's the approach with 2.3 to achieve the same result? Nested query in 2.3 (and 1.7 for that matter) do not have a "join" property to control which hits are emitted.

Thanks for your time and consideration!

eric

1 Like

Hi Eric,

I was able to get some time from the great @mvg to discuss this one because I was unfamiliar with the setting altogether.

Setting "join": false to a nested query is redundant, which is why it was removed. To be more explicit: if you don't want the join behavior of nested functionality, which is what setting join to false does, then simply don't use the nested query at all.

"nested" : {
  "path" : "offers",
  "query" : {
    "match" : {
      "offers.color" : "blue"
    }
  },
  "join" : false
}

becomes

"query" : {
  "match" : {
    "offers.color" : "blue"
  }
}

In some cases, it might be necessary to map your nested fields with include_in_parent.

Hope that helps,
Chris

Hey Chris! Thanks for your reply.

I feel like one of us us backwards. The 1.7 docs say:

The nested filter also supports a join option which controls whether to perform the block join or not. By default, it’s enabled. But when it’s disabled, it emits the hidden nested documents as hits instead of the joined root document.

Which is the opposite of your example.

My understanding is that in 1.7, setting "join":true returns the top-level parent doc, whereas "join":false returns the inner/nested document.

Are you saying that "nested" queries, in 2.3, now always return the top-level parent doc?

If yes, and the answer is really to just not use "nested" (and use include_in_parent instead), then the semantics of searching nested objects, especially with complex bool queries, must be broken in 2.3?

For example, these are two very different queries:

{
  "nested" : {
    "query" : {
      "bool" : {
        "must" : [ {
          "term" : {
            "review_data.review_data_id" : 67115
          }
        }, {
              "terms" : {
                "review_data.responsiveness" : [ "responsive", "potentially responsive", "not responsive", "unreviewable" ]
              }
        } ]
      }
    },
    "path" : "review_data"
  }
}

and

{
    "bool" : {
        "must" : [ {
            "term" : {
                "review_data.review_data_id" : 67115
            }
        }, {
                    "terms" : {
                        "review_data.responsiveness" : [ "responsive", "potentially responsive", "not responsive", "unreviewable" ]
                    }
        } ]
    }
}

Assume that review_data is an array of objects...

The former finds all docs where at least a single review_data element has a data_id:67115 and that same element also has one of those responsiveness values.

The latter, you'd think does the same thing, but it doesn't take into consideration that review_data is an array of elements. So it could match a doc where review_data[0].data_id:67115 but review_data[7].responsiveness="responsive".

These are very different things.

As such, without the ability to set "join":false, the aggregate would be collecting the wrong set of values.

Am I being dense?

Thanks for your time!

eric

I think there is a misunderstanding here.

The join option solely existed for facets/aggregations when specifying filter aggregations under nested aggregations. It didn't make sense to use this option on nested query inside the main query.

If a filter aggregation is inside a nested aggregation then it doesn't make sense to use nested query with join option set to false as just specifying the query that you would wrap inside the nested query directly into the filter aggregation has the same effect. That is the reason why the join option has been removed and you shouldn't use a nested query in this case at all, just use the the actual query/filter directly.

Does this make sense?

1 Like