Few doubts on shard, replica ,data loss on elasticsearch

Hi ,

I'm new to elastic search.

I have few questions for designing elasticsearch architecture.

  1. How to re-index if an shard got corrupted with large data? what is the best way to do it?.

  2. what will be the delta time between primary shard and replica shard?. How do we measure it?.

  3. how do I quantify indexing with large dataset?.

  4. what will happen if primary shard corrupted before replication?.

  5. what is the best practice to scale elasticsearch , if we get data 1TB per day and eventually increases to 100TB per
    day?. In this scenario what is the recommended no of shards and replica?.

  6. Do we have a sizing tool for elasticsearch, Considering cpu/RAM/storage/network ? if we dont have how to calculate it?

  7. Do elasticsearch compress data?. if it compress data, what is the best compression method it uses and how it restore back the data when it required?.

  8. Do elasticsearch has archive mechanism?

  9. Can I use NFS volume for elasticsearch?.

  10. During elasticsearch software upgrade how much downtime should i take?

  11. what is the best practice, I need to follow if my entire data got corrupted ?.

  12. During data push from logstash to elasticsearch , if part of data got corrupted what should i do?.

I had asked many questions and i realize is too much but I'm eager to know it.

Your help is appreciated.

-Saravanan

I'm going to pick some questions and answer those. That might answer the others.

what will be the delta time between primary shard and replica shard?. How do we measure it?.

Well. I don't know the answer. But the only thing you need to actually know is that you will get an answer from elasticsearch after the document has been added to the primary and the replica. So basically it does not really matter.

Do we have a sizing tool for elasticsearch, Considering cpu/RAM/storage/network ? if we dont have how to calculate it?

Do elasticsearch compress data?. if it compress data, what is the best compression method it uses and how it restore back the data when it required?.

Yes. LZ4 or DEFLATE. Read Index Modules | Elasticsearch Reference [5.5] | Elastic

Do elasticsearch has archive mechanism?

Snapshot/Restore

Can I use NFS volume for elasticsearch?.

No. Absolutely not recommended.

During elasticsearch software upgrade how much downtime should i take?

0 if you follow the rolling upgrade procedure.

what is the best practice, I need to follow if my entire data got corrupted ?.

Reindex.

1 Like

Thanks David :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.