_source vs stored fields

Hi,

ElasticSearch supports two ways of storing data. Every field has a store
option and there is a special _source fields which stores the original JSON.

I was wondering what are the rule thumbs to choose the one over the other
(or both?) - performance being the most important factor.

Our application needs to support for two main use cases:

  1. Straight forward searching, where full documents are returned, with hit
    highlighting.
  2. Faceting where we need access to document tokens and metadata fields
    like date.

Thanks for your advice,
Boaz

Hi Boaz

Our application needs to support for two main use cases:

  1. Straight forward searching, where full documents are returned, with
    hit highlighting.
  2. Faceting where we need access to document tokens and metadata
    fields like date.

You don't need _stored fields for either of these. Also, take a look
at:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint

Hi Clint,

Thank you for the reply. I was wondering how this relates to the field
cache - does it store the content of the _source field or does store the
tokenized version of regular fields (or maybe I'm completely off thinking
it's related).

Cheers,
Boaz

On Friday, June 8, 2012 11:58:24 PM UTC+2, Clinton Gormley wrote:

Hi Boaz

Our application needs to support for two main use cases:

  1. Straight forward searching, where full documents are returned, with
    hit highlighting.
  2. Faceting where we need access to document tokens and metadata
    fields like date.

You don't need _stored fields for either of these. Also, take a look
at:
https://groups.google.com/d/topic/elasticsearch/j8cfbv-j73g/discussion

clint

Hi Boaz

Thank you for the reply. I was wondering how this relates to the field
cache - does it store the content of the _source field or does store
the tokenized version of regular fields (or maybe I'm completely off
thinking it's related).

As I understand it, the field cache stores tokens, so the _source field
or store parameter are irrelevant, as both of these just store the
original value, not the tokenized value.

clint