Additional impacts of storing _source?

khyron4eva · May 16, 2017, 6:21pm

Having read through https://www.elastic.co/guide/en/elasticsearch/reference/5.x/mapping-source-field.html a few times for different versions, I have to wonder what are the less obvious impacts of storing _source besides increased disk utilization?

_source is only pulled back on a get, correct?

What's the memory impact of using _source? If my ES node has a shard loaded in RAM, does that shard pull into RAM all _source fields it holds? Or will the _source field be kept on disk until the get occurs?

How does the inverted index relate to the _source field? Non-issue?

Since _source appears to be stored as JSON, how would binary document formats be handled when storing _source? Converted to JSON? Stored as some sort of blob in the JSON body (encoded or some other way)?

Just curious here as I'm considering the impact of making this change across all of the customer use cases I have, from document heavy (PDF, Word, Excel, etc.) to memory constrained.

Cheers!

Claude

nik9000 · May 16, 2017, 6:37pm

On a search hit as well unless you disable it.

The _source from multiple documents are chunked together and stored compressed. When you want to load _source for some document the chunk has to be streamed through memory until we get to the document that you want. For a search, by default, we do this only for the hits returned. All the bytes for all the _source are kept in memory and returned on the response. If you want you can disable this on search but still store the _source in case you need it later. That is fine.

Sometimes we have to do more then just shuffle the bits around, like during highlighting and source filtering. But not always.

We support storing json, yaml, cbor and smile. Personally I'd usually skip putting binary data into Elasticsearch.

khyron4eva · May 17, 2017, 2:55pm

@nik9000,

How would I disable returning _source on search as you mention? I probably want to test both ways but I imagine turning that off behind a flag would be most useful for some of my memory constrained environments.

TIA!

Claude

khyron4eva · May 23, 2017, 1:56am

Re-ping @nik9000

system · June 20, 2017, 1:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance impact due to _source storage Elasticsearch	4	1150	July 6, 2017
Possible optimisations for large _source documents Elasticsearch	7	595	July 5, 2017
_source field storage (_source Field Overview) Elasticsearch	9	430	September 12, 2023
Question about source data Elasticsearch	2	458	July 5, 2017
_source vs stored fields Elasticsearch	4	2255	July 6, 2017

Additional impacts of storing _source?

Related topics