Strange behaviour of the _source.excludes mapping

If I start ES with the basic (almost empty) default mapping, I can issue
the following commands:
[ localhost:~ ]> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}'
{"ok":true,"_index":"twitter","_type":"tweet","_id":"1","_version":1}
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}}

Notice that the empty object is returned as it has been sent.
But if I set the default mapping as following:
{
"default" : {
"_source" : {
"excludes" : ["test"]
}
}
}

then
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {"message":"trying out Elastic
Search","user":"kimchy","post_date":"2009-11-15T14:12:12"}}

the empty object has been removed form the _source!!!
I've made other tests showing that it works only on root fields.
Could someone confirm and explain?

When the "excludes" or "includes" lists are present, source goes through
parse/filter/serialize process before it's saved as a _source field. As a
result of this process, source text is first parsed into a map, then this
map is filtered into another map, and, finally, the new map is serialized
back into the text form. If source is already in the correct format
and "excludes" and "includes" lists are not present, the source is stored
as is.

While elements are copied from one map into another, the filtering process
omits excluded fields as well as empty fieldshttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/xcontent/support/XContentMapValues.java#L183.
I would guess that it's done to remove higher level elements that lost all
nested element as a result of the filtering. Moreover, since empty elements
are not indexed anyway, it makes little sense to keep them in the source.

On Friday, August 10, 2012 4:55:17 AM UTC-4, Virgile Devaux wrote:

If I start ES with the basic (almost empty) default mapping, I can issue
the following commands:
[ localhost:~ ]> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}'
{"ok":true,"_index":"twitter","_type":"tweet","_id":"1","_version":1}
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}}

Notice that the empty object is returned as it has been sent.
But if I set the default mapping as following:
{
"default" : {
"_source" : {
"excludes" : ["test"]
}
}
}

then
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {"message":"trying out Elastic
Search","user":"kimchy","post_date":"2009-11-15T14:12:12"}}

the empty object has been removed form the _source!!!
I've made other tests showing that it works only on root fields.
Could someone confirm and explain?

--

Ok thanks, Igor for this very in depth response.
I'm not sure that an empty field, even not indexed, is the same as no field
at all, but it is not a problem for us for now. May be I'll make a pull
request for this sometime.
Thanks again.

Le vendredi 10 août 2012 20:37:58 UTC+2, Igor Motov a écrit :

When the "excludes" or "includes" lists are present, source goes through
parse/filter/serialize process before it's saved as a _source field. As a
result of this process, source text is first parsed into a map, then this
map is filtered into another map, and, finally, the new map is serialized
back into the text form. If source is already in the correct format
and "excludes" and "includes" lists are not present, the source is stored
as is.

While elements are copied from one map into another, the filtering process
omits excluded fields as well as empty fieldshttps://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/xcontent/support/XContentMapValues.java#L183.
I would guess that it's done to remove higher level elements that lost all
nested element as a result of the filtering. Moreover, since empty elements
are not indexed anyway, it makes little sense to keep them in the source.

On Friday, August 10, 2012 4:55:17 AM UTC-4, Virgile Devaux wrote:

If I start ES with the basic (almost empty) default mapping, I can issue
the following commands:
[ localhost:~ ]> curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}'
{"ok":true,"_index":"twitter","_type":"tweet","_id":"1","_version":1}
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search",
"emptyObject" : {}
}}

Notice that the empty object is returned as it has been sent.
But if I set the default mapping as following:
{
"default" : {
"_source" : {
"excludes" : ["test"]
}
}
}

then
[ localhost:~ ]> curl -XGET 'http://localhost:9200/twitter/tweet/1'
{"_index":"twitter","_type":"tweet","_id":"1","_version":1,"exists":true,
"_source" : {"message":"trying out Elastic
Search","user":"kimchy","post_date":"2009-11-15T14:12:12"}}

the empty object has been removed form the _source!!!
I've made other tests showing that it works only on root fields.
Could someone confirm and explain?

--