Best way to make a consistent copy of data

ankh · January 4, 2023, 4:27am

I have a system containing billions of documents. I have a requirement to have a consistent point in time copy of a subset of fields from 1 year's worth of data (eg 200M documents). The documents are subject to change, in the order of a small fraction of the documents involved, but enough to be concerned about having an assured consistent, complete copy of the data "frozen" for future reference. The copy will be used by another system, and kept to provide evidence for why the other system produced its output.

The options I can think of are:

Extract the data from Elastic and write to another permanent store that will use this data.
Snapshot the data within Elastic.

For (1) I guess something like point-in-time is the way to ensure consistency? We are running 6.2.3 and so don't have access to features like this, so I suppose we would have to risk inconsistency.

For (2) we would still need to extract the data to a permanent store as the Elastic store is not truly permanent for us. However I'm not sure if I understand the snapshot process in terms of data consistency correctly.

What would be the best way to achieve this?

warkolm · January 4, 2023, 7:03am

You could also clone the index.

Very old and very EOL. You would get a tonne of benefit from upgrading asap.

system · February 1, 2023, 7:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating a partial clone of a cluster Elasticsearch	3	370	July 13, 2020
Snapshot data based on query Elasticsearch	6	568	October 29, 2019
Make a copy of a running Elastic node Elasticsearch	9	698	March 1, 2023
Saving ES data folder vs snapshotting Elasticsearch	2	408	August 18, 2018
Snapshot API Question Elasticsearch	2	293	July 6, 2017

Best way to make a consistent copy of data

Related topics