I'm planing to use elasticsearch for my reporting purposes. I'm very new to elasticsearch therefore looking for some advises in below areas.
For reporting purposes i'm using ES. My data set roughly estimated to be 5 billion. This data contains only 1 type of data hence 1 index. I need to keep data for 3 months time period. This data would be search to generate reports. Therefore I know the most of the queries i'm using. The main requirements are low response time. Also i cannot loose single document.
To begin with index design I'm using shard = 1000, replica = 1. Is using 1000 shards too much ? Does replica = 1 good, given that i don't want to loose
I have data fields which should not be searchable but should appear in result based on other searches. Is there any benefit marking those as index : false ? I'm looking for high performance queries and to reduce disk usage.
For cluster i'm starting with 3 dedicated master nodes. 2 data nodes to begin which expect to grow later. 1 client node. Is this good cluster to
start with ( HA and avoiding split brain are the focus here )
For index deletion, i'm thinking creating monthly index then deleting index which are 3 months old using curator. Does this delete would effect
any queries run on that time ? Is there a better approach here ?
Do i need to get manual backup or replica=1 would be enough ?
Is there any advice or areas which i should look more into.