Nested query, missing inner hits


(Benjamin Gathmann) #1

I have observed a strange behavior for nested queries which I do not understand. Here is an example query:

{
    "query" : {
        "nested" : {
            "path" : "first.nested.path",
            "query" : {
                "bool" : {
                    "filter" : [{
                            "term" : {
                                "first.nested.path.general.something" : "ABC"
                            }
                        }, {
                            "nested" : {
                                "path" : "first.nested.path.second.path",
                                "query" : {
                                    "bool" : {
                                        "filter" : [{
                                                "term" : {
                                                    "first.nested.path.second.path.value" : "Hello"
                                                }
                                            }, {
                                                "term" : {
                                                    "first.nested.path.second.path.value2" : "World"
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "inner_hits" : {
        "kids" : {
            "path" : {
                "first.nested.path" : {
                    "query" : {
                        "term" : {
                            "first.nested.path.general.something" : "ABC"
                        }
                    },
                    "_source" : false,
                    "fielddata_fields" : ["first.nested.path.general.something"],
                    "inner_hits" : {
                        "kids" : {
                            "path" : {
                                "first.nested.path.second.path" : {
                                    "query" : {
                                        "bool" : {
                                            "filter" : [{
                                                    "term" : {
                                                        "first.nested.path.second.path.value" : "Hello"
                                                    }
                                                }, {
                                                    "term" : {
                                                        "first.nested.path.second.path.value2" : "World"
                                                    }
                                                }
                                            ]
                                        }
                                    },
                                    "_source" : false,
                                    "fielddata_fields" : ["first.nested.path.second.path.value", "first.nested.path.second.path.value2"]
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

This query returns empty inner_hits for some results. But how is this possible?
I have checked the original data and found that the filters apply (so it is correct that the result is returned), but for some reason the inner_hits are not shown.
What are possible reasons for this?

I am also not quite sure about how to include query->bool->filter->term in the inner_hits definition. Should it be:

 "query" : {
                            "term" : {

(like above)
OR:

"query" : {
                        "filtered": {
                            "filter": {
                                "term" : {

?

In general, this part of the query DSL is pretty awkward (see also:


(Benjamin Gathmann) #2

To better illustrate this behavior, I have created this Gist:


and the kind of data I receive for SOME of the hits:

Does anybody have an idea what is going on?
Is it a bug? Or is something wrong with my syntax?


(Benjamin Gathmann) #3

Here is the probable answer (but not solution yet) to my question:
It seems like ES stumbles over "null" values for objects when it retrieves inner_hits. I have added an example to my Gist here (you can see that the object "f" is null in the second "first" entry :

I am not sure what to do about this problem. It looks like a bug to me - I guess ES should check for null values when iterating over the documents. What is the way forward?


(Benjamin Gathmann) #4

It turned out the problem lies somewhere else.
It has to do with the pretty insane syntax for top level inner hits. Here is a related issue on Github:

The point is that if I have multiple nesting, I have to repeat a query several times, i.e. not only in the deepest path, but also in all intermediate paths in between.
Otherwise, the query will return all inner_hits for the intermediate level, even those that do not have a matching nested object inside that matches the actual query.

Here is what my example query needs to look like:

As @RanadeepPolavarapu writes:
"Downside that It is more verbose, because the query needs to be repeated on each nested level."

I would be very happy if something could be done to make the syntax a little less "verbose".


(system) #5