Additional impacts of storing _source?

Having read through https://www.elastic.co/guide/en/elasticsearch/reference/5.x/mapping-source-field.html a few times for different versions, I have to wonder what are the less obvious impacts of storing _source besides increased disk utilization?

_source is only pulled back on a get, correct?

What's the memory impact of using _source? If my ES node has a shard loaded in RAM, does that shard pull into RAM all _source fields it holds? Or will the _source field be kept on disk until the get occurs?

How does the inverted index relate to the _source field? Non-issue?

Since _source appears to be stored as JSON, how would binary document formats be handled when storing _source? Converted to JSON? Stored as some sort of blob in the JSON body (encoded or some other way)?

Just curious here as I'm considering the impact of making this change across all of the customer use cases I have, from document heavy (PDF, Word, Excel, etc.) to memory constrained.

Cheers!

Claude

On a search hit as well unless you disable it.

The _source from multiple documents are chunked together and stored compressed. When you want to load _source for some document the chunk has to be streamed through memory until we get to the document that you want. For a search, by default, we do this only for the hits returned. All the bytes for all the _source are kept in memory and returned on the response. If you want you can disable this on search but still store the _source in case you need it later. That is fine.

Sometimes we have to do more then just shuffle the bits around, like during highlighting and source filtering. But not always.

We support storing json, yaml, cbor and smile. Personally I'd usually skip putting binary data into Elasticsearch.

@nik9000,

How would I disable returning _source on search as you mention? I probably want to test both ways but I imagine turning that off behind a flag would be most useful for some of my memory constrained environments.

TIA!

Claude

Re-ping @nik9000

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.