What does it mean to "store" a field?

Throughout the documentation on the website, the "store" option is
mentioned. Eg:

"The field is stored in the index"
"Set to yes the store actual field in the index, no to not store it."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

What are the consequences of storing, or not storing, a field in the index?

My guess is that an unstored field can't be queried, but will be returned
when retrieving the document.

Thanks guys.

When you index a field, you can search it.
If you store it, you can display the content of this field if your document matches.

But, if you store the whole document (source), you can also display it.

So an unstored field can be queried but can not be displayed if you have also disabled source.

This how I understand it.

Correct me if I'm wrong...

HTH
David :wink:
@dadoonet

Le 16 nov. 2011 à 23:43, Nick Hoffman nick@deadorange.com a écrit :

Throughout the documentation on the website, the "store" option is mentioned. Eg:

"The field is stored in the index"
"Set to yes the store actual field in the index, no to not store it."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

What are the consequences of storing, or not storing, a field in the index?

My guess is that an unstored field can't be queried, but will be returned when retrieving the document.

Thanks guys.

Great question, I'd like to get a good understanding of this too. One
thing is, I think if your going to highlight fields, then it is
better to store them too, so that they don't have to parsed from the
source for the highlighting. That atleast was my impression after
reading http://www.elasticsearch.org/guide/reference/api/search/highlighting.html

On Nov 16, 7:17 pm, David Pilato da...@pilato.fr wrote:

When you index a field, you can search it.
If you store it, you can display the content of this field if your document matches.

But, if you store the whole document (source), you can also display it.

So an unstored field can be queried but can not be displayed if you have also disabled source.

This how I understand it.

Correct me if I'm wrong...

HTH
David :wink:
@dadoonet

Le 16 nov. 2011 à 23:43, Nick Hoffman n...@deadorange.com a écrit :

Throughout the documentation on the website, the "store" option is mentioned. Eg:

"The field is stored in the index"
"Set to yes the store actual field in the index, no to not store it."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

What are the consequences of storing, or not storing, a field in the index?

My guess is that an unstored field can't be queried, but will be returned when retrieving the document.

Thanks guys.

Cool, that makes sense. Where/how does one configure whether or not the
whole document is stored? I looked around on the ES website, but couldn't
find this detail.

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David :wink:
@dadoonet

Le 17 nov. 2011 à 05:57, Nick Hoffman nick@deadorange.com a écrit :

Cool, that makes sense. Where/how does one configure whether or not the whole document is stored? I looked around on the ES website, but couldn't find this detail.

Heya,

By default in elasticsearch, the _source (the document one indexed) is
stored. This means when you search, you can get the actual document source
back. Moreover, elasticsearch will automatically extract fields / objects
from the _source and return them if you explicitly ask for it (as well as
possibly use it in other components, like highlighting).

You can specify that a specific field is also stored. This menas that
the data for that field will be stored "on its own". Meaning that if you
ask for "field1" (which is stored), elasticsearch will identify that its
stored, and load it from the index instead of getting it from the _source
(assuming _source is enabled).

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some fields
to be stored instead.

Note, there is a cost of retrieving each stored field. So, for example,
if you have a json with 10 fields with reasonable size, and you map all of
them as stored, and ask for all of them, this means loading each one (more
disk seeks), compared to just loading the _source (which is one field,
possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato david@pilato.fr wrote:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David :wink:
@dadoonet

Le 17 nov. 2011 à 05:57, Nick Hoffman nick@deadorange.com a écrit :

Cool, that makes sense. Where/how does one configure whether or not the
whole document is stored? I looked around on the ES website, but couldn't
find this detail.

Ah, perfect! Thanks, David.

Great explanation, Shay. Thanks for taking the time to write that. It
really clears up all of my questions.

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some fields
to be stored instead.

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?

Almost, sometimes it also make sense to store the _source and store other
fields specifically. If you need hte _source now and then, for example, in
order to reindex, but it can be quite big, so not to pay the price of
loading and possibly parsing it, just fetch specific stored fields.

On Mon, Dec 5, 2011 at 2:21 AM, Ray Ward ray@rayward.com.au wrote:

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some
fields
to be stored instead.

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field
stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of storage without
sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know
...

Thanks
Vineeth

On Tue, Dec 6, 2011 at 12:52 AM, Shay Banon kimchy@gmail.com wrote:

Almost, sometimes it also make sense to store the _source and store other
fields specifically. If you need hte _source now and then, for example, in
order to reindex, but it can be quite big, so not to pay the price of
loading and possibly parsing it, just fetch specific stored fields.

On Mon, Dec 5, 2011 at 2:21 AM, Ray Ward ray@rayward.com.au wrote:

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well.
If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some
fields
to be stored instead.

So if you are going to store the source (which is on by default) then
you shouldn't store individual fields as it offers no real advantage.
And if you have very large documents where you may be searching
multiple fields, but only need certain fields returned in the hit,
then you may choose to disable storing the source and then you store
individual fields instead.

Is this the intention?

Hi,

Just have about the same question regarding highlighting of attachments.

It seems that if you disable source for documents with attachment field, you
can’t highlight them, even if you mark the field attachment to be stored.

Documentation is saying :

In order to perform highlighting, the actual content of the field is
required. If the field in question is stored (has store set to yes in the
mapping), it will be used, otherwise, the actual_source will be loaded and
the relevant field will be extracted from it.

@Shay, can you confirm that or should I make more tests to find how to do it
?

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : elasticsearch@googlegroups.com
Objet : Re: What does it mean to "store" a field?

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field
stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of storage without sacrificing
the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know
...

Thanks
Vineeth

I tired some tests on, following are the index size with the below
mentioned tests.
PS - The test i used as base 64 of some binary data

With

Store=yes , _source=enable ------ 1.3 MB
Store=No , _source=disable ------- 1 MB
Store=NO , _source=disable - 600 KB
Store=No,_source=enable,compress=on - 900 KB

So i come to conclusion that by default the same string is stored twice.

Thanks
Vineeth

On Fri, Feb 17, 2012 at 1:42 PM, David Pilato david@pilato.fr wrote:

Hi,****



Just have about the same question regarding highlighting of attachments.**
**


It seems that if you disable source for documents with attachment field,
you can’t highlight them, even if you mark the field attachment to be
stored.****

Documentation is saying :****

In order to perform highlighting, the actual content of the field is
required. If the field in question is stored (has store set to yes in the
mapping), it will be used, otherwise, the actual_source will be loaded
and the relevant field will be extracted from it.****


@Shay, can you confirm that or should I make more tests to find how to do
it ?****




David.****



De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : elasticsearch@googlegroups.com
Objet : Re: What does it mean to "store" a field?****


So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field
stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of storage without
sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me
know ...

Thanks
Vineeth****


Yes, when the field is stored, then it will be used for highlighting. Thats the behavior.

On Friday, February 17, 2012 at 10:12 AM, David Pilato wrote:

Hi,

Just have about the same question regarding highlighting of attachments.

It seems that if you disable source for documents with attachment field, you can’t highlight them, even if you mark the field attachment to be stored.

Documentation is saying :

In order to perform highlighting, the actual content of the field is required. If the field in question is stored (has store set to yes in the mapping), it will be used, otherwise, the actual_source will be loaded and the relevant field will be extracted from it.

@Shay, can you confirm that or should I make more tests to find how to do it ?

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] De la part de Vineeth Mohan
Envoyé : vendredi 17 février 2012 04:09
À : elasticsearch@googlegroups.com (mailto:elasticsearch@googlegroups.com)
Objet : Re: What does it mean to "store" a field?

So let me get this idea right.
If store is yes and _source is enabled , is 2 copies of the same field stored in ES ?
Also is compression applied to "stored" field too ?

I am seeing all possible ways to reduce size of storage without sacrificing the high-lightening functionality.
If there is any other good option apart from the aforementioned let me know ...

Thanks
Vineeth

Is there and way to mark fields as stored using JAVA API

On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:

Heya,

By default in elasticsearch, the _source (the document one indexed) is
stored. This means when you search, you can get the actual document source
back. Moreover, elasticsearch will automatically extract fields / objects
from the _source and return them if you explicitly ask for it (as well as
possibly use it in other components, like highlighting).

You can specify that a specific field is also stored. This menas that
the data for that field will be stored "on its own". Meaning that if you
ask for "field1" (which is stored), elasticsearch will identify that its
stored, and load it from the index instead of getting it from the _source
(assuming _source is enabled).

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some fields
to be stored instead.

Note, there is a cost of retrieving each stored field. So, for example,
if you have a json with 10 fields with reasonable size, and you map all of
them as stored, and ask for all of them, this means loading each one (more
disk seeks), compared to just loading the _source (which is one field,
possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato david@pilato.fr wrote:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David :wink:
@dadoonet

Le 17 nov. 2011 à 05:57, Nick Hoffman nick@deadorange.com a écrit :

Cool, that makes sense. Where/how does one configure whether or not the
whole document is stored? I looked around on the ES website, but couldn't
find this detail.

Also how to mark _source as not stored?

On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:

Heya,

By default in elasticsearch, the _source (the document one indexed) is
stored. This means when you search, you can get the actual document source
back. Moreover, elasticsearch will automatically extract fields / objects
from the _source and return them if you explicitly ask for it (as well as
possibly use it in other components, like highlighting).

You can specify that a specific field is also stored. This menas that
the data for that field will be stored "on its own". Meaning that if you
ask for "field1" (which is stored), elasticsearch will identify that its
stored, and load it from the index instead of getting it from the _source
(assuming _source is enabled).

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or
the cost of parsing the _source is high, you can explicitly map some fields
to be stored instead.

Note, there is a cost of retrieving each stored field. So, for example,
if you have a json with 10 fields with reasonable size, and you map all of
them as stored, and ask for all of them, this means loading each one (more
disk seeks), compared to just loading the _source (which is one field,
possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato david@pilato.fr wrote:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David :wink:
@dadoonet

Le 17 nov. 2011 à 05:57, Nick Hoffman nick@deadorange.com a écrit :

Cool, that makes sense. Where/how does one configure whether or not the
whole document is stored? I looked around on the ES website, but couldn't
find this detail.

You can specify how source is handled (stored,not-stored,compressed)
in the mapping:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

On Wed, Jun 13, 2012 at 4:19 AM, Saurabh saurabh.k1510@gmail.com wrote:

Also how to mark _source as not stored?

On Thursday, 17 November 2011 18:49:20 UTC+5:30, kimchy wrote:

Heya,

By default in elasticsearch, the _source (the document one indexed) is
stored. This means when you search, you can get the actual document source
back. Moreover, elasticsearch will automatically extract fields / objects
from the _source and return them if you explicitly ask for it (as well as
possibly use it in other components, like highlighting).

You can specify that a specific field is also stored. This menas that
the data for that field will be stored "on its own". Meaning that if you ask
for "field1" (which is stored), elasticsearch will identify that its stored,
and load it from the index instead of getting it from the _source (assuming
_source is enabled).

When do you want to enable storing specific fields? Most times, you
don't. Fetching the _source is fast and extracting it is fast as well. If
you have very large documents, where the cost of storing the _source, or the
cost of parsing the _source is high, you can explicitly map some fields to
be stored instead.

Note, there is a cost of retrieving each stored field. So, for example,
if you have a json with 10 fields with reasonable size, and you map all of
them as stored, and ask for all of them, this means loading each one (more
disk seeks), compared to just loading the _source (which is one field,
possibly compressed).

-shay.banon

On Thu, Nov 17, 2011 at 7:51 AM, David Pilato david@pilato.fr wrote:

http://www.elasticsearch.org/guide/reference/mapping/source-field.html

David :wink:
@dadoonet

Le 17 nov. 2011 à 05:57, Nick Hoffman nick@deadorange.com a écrit :

Cool, that makes sense. Where/how does one configure whether or not the
whole document is stored? I looked around on the ES website, but couldn't
find this detail.

Hi All,

Here I have a question as well for the same with respect to below scenario.

  1. Say I am indexing a huge amount of data (say 500GB) per day.
  2. It contains about 15 fields including attachments as well.
  3. As per current arch _source is set as default (enabled).
  4. Almost all fields are set as store true as well.
  5. Now I am firing two type of query:
    a) Retrieving docs on the basis of id which will give complete source and
    b) On the basis of some fields say 4 or 5 fields.

So considering the above query for retrieval and amount of data, does it really make sense to store the source as well as individual field. Because ES internally returns the data as field as well even if it is not stored (but _source is stored).

As in some of the post I read retrieval from fields is faster in case we have less fields and less data. But what in case of large data, retrieving from fields is faster or retrieving from _source.

Please confirm for the same.

~Prashant

My understanding is that it is mostly more efficient to not store any
fields and just let Elasticsearch load them from the source when needed.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1jz8QDjd074xhJm2bDetVH4svzSbWc7970oCYWQw6wOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Loading from _source, especially in scripts, is slow, and uses additional
memory. Also note that large _source field is compressed, which adds
another bit of CPU overhead.

With stored fields, you have finer control over these issues. So it might
make sense to compress large binary data fields
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#binary

Jörg

On Mon, Sep 22, 2014 at 3:41 PM, Nikolas Everett nik9000@gmail.com wrote:

My understanding is that it is mostly more efficient to not store any
fields and just let Elasticsearch load them from the source when needed.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1jz8QDjd074xhJm2bDetVH4svzSbWc7970oCYWQw6wOg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1jz8QDjd074xhJm2bDetVH4svzSbWc7970oCYWQw6wOg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG5oynqESWuK3DvsFsebhpqb_jj2uF0o4_GUqKEhuQBVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.