Disabling _source and using stored fields

Hi,

I have a use case of very large documents (size>10MB) where the metadata
(title, author, etc) is small.
I thought it could be beneficial to separate the body from the metadata and
use different fields for them, because in a result list, you typically only
need the metadata.
So, I disabled the _source field and stored both the body as the metadata
as fields.

However, while storing and indexing works as expected, I'm not able to get
my data back.
Complex fields (ie objects) cannot be retrieved and return an exception:
ElasticsearchIllegalArgumentException[field [s3] isn't a leaf field]

Is this approach unsupported? Am I doing something wrong?
A small example:

curl -XPOST localhost:9200/.test?pretty=true -d {'
"mappings": {
"test": {
"_source" : {"enabled" : false},
"properties": {
"s1": { "type": "integer", "index": "no", "store": "yes" },
"s2": { "type": "integer", "index": "no", "store": "yes" },
"s3": { "type": "object", "index": "no", "store": "yes" }
}}
}
}'

curl -XPOST localhost:9200/.test/test/1 -d {'
"s1": 123,
"s2": [1,2,3,4,5],
"s3": [{"x":1, "y":2, "z":3}]
}'

sleep 1

#will succeed
curl -XPOST localhost:9200/.test/_search?pretty=true -d {'
"fields": ["s1", "s2"]
}'

#will fail with "ElasticsearchIllegalArgumentException[field [s3] isn't a
leaf field]"
curl -XPOST localhost:9200/.test/_search?pretty=true -d {'
"fields": ["s3"]
}'

I am aware that I could use the _source field and just exclude the body
from it. But I expect that fetching the complete _source is costly.
I would like to measure the impact of the 2 solutions.

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/525eb286-8a20-43cb-ba79-aab07e58b4a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Peter,

Unfortunately elasticsearch does not support storing object fields (the
fact that it did not fail the mapping update that you sent is due to the
fact that parsing is lenient and ignores unknown parameters).

In your case, I think an option could be to keep _source enabled, and to
also store small meta fields on their own. Compression should make the
overhead very minimal in terms of disk space. Then when you only need meta
fields you can use the fields= option, and otherwise use source filtering
when you need to fetch entire objects or unstored fields.

On Tue, Nov 11, 2014 at 7:27 AM, Peter van der Weerd pw2@bitmanager.nl
wrote:

Hi,

I have a use case of very large documents (size>10MB) where the metadata
(title, author, etc) is small.
I thought it could be beneficial to separate the body from the metadata
and use different fields for them, because in a result list, you typically
only need the metadata.
So, I disabled the _source field and stored both the body as the metadata
as fields.

However, while storing and indexing works as expected, I'm not able to get
my data back.
Complex fields (ie objects) cannot be retrieved and return an exception:
ElasticsearchIllegalArgumentException[field [s3] isn't a leaf field]

Is this approach unsupported? Am I doing something wrong?
A small example:

curl -XPOST localhost:9200/.test?pretty=true -d {'
"mappings": {
"test": {
"_source" : {"enabled" : false},
"properties": {
"s1": { "type": "integer", "index": "no", "store": "yes"
},
"s2": { "type": "integer", "index": "no", "store": "yes"
},
"s3": { "type": "object", "index": "no", "store": "yes" }
}}
}
}'

curl -XPOST localhost:9200/.test/test/1 -d {'
"s1": 123,
"s2": [1,2,3,4,5],
"s3": [{"x":1, "y":2, "z":3}]
}'

sleep 1

#will succeed
curl -XPOST localhost:9200/.test/_search?pretty=true -d {'
"fields": ["s1", "s2"]
}'

#will fail with "ElasticsearchIllegalArgumentException[field [s3] isn't a
leaf field]"
curl -XPOST localhost:9200/.test/_search?pretty=true -d {'
"fields": ["s3"]
}'

I am aware that I could use the _source field and just exclude the body
from it. But I expect that fetching the complete _source is costly.
I would like to measure the impact of the 2 solutions.

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/525eb286-8a20-43cb-ba79-aab07e58b4a4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/525eb286-8a20-43cb-ba79-aab07e58b4a4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4H6VdLaYFZOVTCDzAH%3DaKS6-NF_zF%2B%3DDr%2B8e%3D%3D-Zsy%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.