Hi All,
I am a user of Elasticsearch version 1.7 and i realize that its high time to upgrade (better late than never ).
However, Elasticsearch 5.X on-wards i noticed a breaking change which is giving me a hard time doing a smooth upgrade. Starting from version 5.X ES totally skips an Array if no nested element is found, while ES 1.7 used to give a empty array [ ]. The same behavior is explained in example below,
Consider the following document,
{
"array": [
{
"name": "abc"
}
]
}
If we have a simple query where _source:["array.title"], then
ES 1.7 returns
[
{ "array": [] }
]
While ES 5.X + returns.
[
{ }
]
Consider the following scenario.
POST /test/_doc/1
{
"field":"value",
"array":[],
"object":{}
}
POST /test/_doc/2
{
"field":"value",
"array":[ { "exclude":"bar" } ],
"object":{ "exclude":"bar" }
}
POST /test/_doc/3
{
"field" : "value",
"array": [{ "exclude": "bar"}, {"include" : "bar"}],
"object": { "exclude": "bar", "include" : "bar" }
}
POST /test/_search?pretty
{
"_source": {"includes": ["name","object.include","array.include"] }
}
Response:
{
"hits":[
{
"_index":"test",
"_type":"_doc",
"_id":"82HEBnQBxtHMu64O7lFB",
"_score":1,
"_source":{
}
},
{
"_index":"test",
"_type":"_doc",
"_id":"9GHFBnQBxtHMu64OElFU",
"_score":1,
"_source":{
}
},
{
"_index":"test",
"_type":"_doc",
"_id":"9WHFBnQBxtHMu64ONFGE",
"_score":1,
"_source":{
"array":[
{
"include":"bar"
}
],
"object":{
"include":"bar"
}
}
}
]
}
Shouldn't the first 2 results include empty Arrays/Objects as well ? Not including them creates a bigger problem for nested scenario as shown below,
PUT /test1
{
"mappings":{
"properties":{
"product":{
"type":"text"
},
"Users":{
"type":"nested",
"properties":{
"name":{
"type":"text"
},
"Address":{
"type":"nested",
"properties":{
"country":{
"type":"text"
}
}
}
}
}
}
}
}
POST /test1/_doc/1
{
"product":"Laptop",
"Users":[
{
"name":"user1",
"Address":[
{
"country":"country1"
}
]
},
{
"name":"user2",
"Address":[
]
},
{
"name":"user3",
"Address":[
{
"country":"country2"
}
]
},
{
"name":"user4",
"Address":[
{
"country":"country3"
}
]
}
]
}
POST /test1/_doc/_search
{
"query":{
"nested":{
"path":"Users.Address",
"inner_hits":{
},
"query":{
"match":{
"Users.Address.country":"country2"
}
}
}
},
"_source":[
"Users.Address.country"
]
}
Response
{
"hits":{
"total":{
"value":1,
"relation":"eq"
},
"max_score":0.9808291,
"hits":[
{
"_index":"test1",
"_type":"_doc",
"_id":"1",
"_score":0.9808291,
"_source":{
"Users":[
{
"Address":[
{
"country":"country1"
}
]
},
{
"Address":[
{
"country":"country2"
}
]
},
{
"Address":[
{
"country":"country3"
}
]
}
]
},
"inner_hits":{
"Users.Address":{
"hits":{
"total":{
"value":1,
"relation":"eq"
},
"max_score":0.9808291,
"hits":[
{
"_index":"test1",
"_type":"_doc",
"_id":"1",
"_nested":{
"field":"Users",
"offset":2,
"_nested":{
"field":"Address",
"offset":0
}
},
"_score":0.9808291,
"_source":{
"country":"country2"
}
}
]
}
}
}
}
]
}
}
In above response Innher_hits is giving Users[2].Address[0] as a match for condition country= "country2". However, in the _source object Users[2] appears at index 1 of Users Array. This creates reliability concerns for offset attribute. For elastic-search versions prior to 5.X, we used to get empty [ ] if no nested attribute was found and offsets could be reliably used to pull data from _source object.