Best ways to prevent data lost in elastic


(hoey) #1

we decide to use Elasticsearch for our data store and handle different search scenarios with that.
data is critical and should never lost. we should store it for 10 days.
because data is time series we want to create index hourl. so different queries route into different indices. the other reason we choose hourly index is easy to remove older data ( after 10 days)
because we choose elasticsearch as a primary data store , we concern about loosing our data.
does Elasticsearch have solution for handling data lost?
is snapshot and restore suitable for that?
or we should have a primary storage and using Elasticsearch as a secondary data store.


(David Pilato) #2

Read https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

If you absolutely need 100% guaranty, I'd use logstash to send data to elasticsearch and a datastore (HDFS, FS, Oracle, PostgreSQL, ...).

That way you will have data in both places in case of any hard crash.


(hoey) #3

1- I decide to create hourly indices and store snapshots for each of them.
so for handling data lost for already indexed data, I think I can restore corrupted index from it's snapshot.

2- on the other hand, for handling data lost in insertion time, I think about using logstash with kafka ( logstash consumes data from kafka).

is this solutions overcome data lost ?


(David Pilato) #4
  1. Yes. Most likely
  2. Yes. That could be a good safeguard.

You can always consume the data from Kafka with Logstash and ask Logstash to store the raw data to whatever datastore you want (S3, HDFS, ...) and index the enriched data to Elasticsearch.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.