Source Filtering does not return empty Array if no nested properties are found

Hi All,
I am a user of Elasticsearch version 1.7 and i realize that its high time to upgrade :stuck_out_tongue: (better late than never :smile: ).
However, Elasticsearch 5.X on-wards i noticed a breaking change which is giving me a hard time doing a smooth upgrade. Starting from version 5.X ES totally skips an Array if no nested element is found, while ES 1.7 used to give a empty array [ ]. The same behavior is explained in example below,

Consider the following document,

    {
        "array": [
             {
                 "name": "abc"
             }
         ]
    }

If we have a simple query where _source:["array.title"], then
ES 1.7 returns

    [
        { "array": [] }
    ]

While ES 5.X + returns.

    [
        { }
    ]

Consider the following scenario.

POST /test/_doc/1
    {
        "field":"value",
        "array":[],
        "object":{}
    }

POST /test/_doc/2
    {
        "field":"value",
        "array":[ { "exclude":"bar" } ],
        "object":{ "exclude":"bar" }
    }

POST /test/_doc/3
    {
    "field" : "value",
    "array": [{ "exclude": "bar"}, {"include" : "bar"}],
    "object": { "exclude": "bar", "include" : "bar" }
    }

POST /test/_search?pretty
    {
    "_source": {"includes": ["name","object.include","array.include"] }
    }

Response:
    {
        "hits":[
            {
                "_index":"test",
                "_type":"_doc",
                "_id":"82HEBnQBxtHMu64O7lFB",
                "_score":1,
                "_source":{
                    
                }
            },
            {
                "_index":"test",
                "_type":"_doc",
                "_id":"9GHFBnQBxtHMu64OElFU",
                "_score":1,
                "_source":{
                    
                }
            },
            {
                "_index":"test",
                "_type":"_doc",
                "_id":"9WHFBnQBxtHMu64ONFGE",
                "_score":1,
                "_source":{
                    "array":[
                        {
                            "include":"bar"
                        }
                    ],
                    "object":{
                        "include":"bar"
                    }
                }
            }
        ]
    }

Shouldn't the first 2 results include empty Arrays/Objects as well ? Not including them creates a bigger problem for nested scenario as shown below,

PUT /test1
    {
        "mappings":{
            "properties":{
                "product":{
                    "type":"text"
                },
                "Users":{
                    "type":"nested",
                    "properties":{
                        "name":{
                            "type":"text"
                        },
                        "Address":{
                            "type":"nested",
                            "properties":{
                                "country":{
                                    "type":"text"
                                }
                            }
                        }
                    }
                }
            }
        }
    }

POST /test1/_doc/1
    {
        "product":"Laptop",
        "Users":[
            {
                "name":"user1",
                "Address":[
                    {
                        "country":"country1"
                    }
                ]
            },
            {
                "name":"user2",
                "Address":[
                    
                ]
            },
            {
                "name":"user3",
                "Address":[
                    {
                        "country":"country2"
                    }
                ]
            },
            {
                "name":"user4",
                "Address":[
                    {
                        "country":"country3"
                    }
                ]
            }
        ]
    }

POST /test1/_doc/_search
    {
        "query":{
            "nested":{
                "path":"Users.Address",
                "inner_hits":{
                    
                },
                "query":{
                    "match":{
                        "Users.Address.country":"country2"
                    }
                }
            }
        },
        "_source":[
            "Users.Address.country"
        ]
    }

Response
    {
        "hits":{
            "total":{
                "value":1,
                "relation":"eq"
            },
            "max_score":0.9808291,
            "hits":[
                {
                    "_index":"test1",
                    "_type":"_doc",
                    "_id":"1",
                    "_score":0.9808291,
                    "_source":{
                        "Users":[
                            {
                                "Address":[
                                    {
                                        "country":"country1"
                                    }
                                ]
                            },
                            {
                                "Address":[
                                    {
                                        "country":"country2"
                                    }
                                ]
                            },
                            {
                                "Address":[
                                    {
                                        "country":"country3"
                                    }
                                ]
                            }
                        ]
                    },
                    "inner_hits":{
                        "Users.Address":{
                            "hits":{
                                "total":{
                                    "value":1,
                                    "relation":"eq"
                                },
                                "max_score":0.9808291,
                                "hits":[
                                    {
                                        "_index":"test1",
                                        "_type":"_doc",
                                        "_id":"1",
                                        "_nested":{
                                            "field":"Users",
                                            "offset":2,
                                            "_nested":{
                                                "field":"Address",
                                                "offset":0
                                            }
                                        },
                                        "_score":0.9808291,
                                        "_source":{
                                            "country":"country2"
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }

In above response Innher_hits is giving Users[2].Address[0] as a match for condition country= "country2". However, in the _source object Users[2] appears at index 1 of Users Array. This creates reliability concerns for offset attribute. For elastic-search versions prior to 5.X, we used to get empty [ ] if no nested attribute was found and offsets could be reliably used to pull data from _source object.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.