Question about source data

tesseract · January 28, 2016, 5:25pm

Hey,
We are currently storing the source for each document we are indexing on Elasticsearch. Is it possible to get the raw source of each data as a file from Elasticsearch? How is source data stored on disk compared to indices?
Ideally, we want a way to directly take the data off the disk and push it to another server rather than having to query elasticsearch for all the documents. The reason behind this is we want to run a job everyday that will push the source (the raw request) for the past 24 hours to more permanent storage. We only index 24 hours worth of data in elasticsearch, but we need the source (which we update 2-3 times while it's in elasticsearch) for longer duration for some machine learning jobs.

polyfractal · January 28, 2016, 6:25pm

Nope, I'm afraid there's no way to do that at the moment.

The _source is stored in a Lucene "stored field", and lives with the inverted index in the segment files. There's no way to extract it without using something like a scroll to iterate over all the docs.

That said, iterating over all the docs with a big scroll should be relatively quick, it's optimized for bulk exporting data from the segments.

Topic		Replies	Views
Storage in Elastic Search Elasticsearch	3	744	July 5, 2017
Elastic search data format Elasticsearch	6	987	July 5, 2017
Additional impacts of storing _source? Elasticsearch	4	786	June 20, 2017
_source field storage (_source Field Overview) Elasticsearch	9	430	September 12, 2023
Where is _source stored? Elasticsearch	2	306	July 6, 2017

Question about source data

Related topics