- Ingesting once a day (not real-time) 60 million JSON documents via a once run batch script
- Elasticsearch to query 90 days worth of data for a total of 5.4 billion documents
- Occupies about 2 TB of storage overall (23 GB per day and 161 GB a week)
Is it possible to get good query times with this in Elasticsearch? On a non-SSD cluster (6 nodes) we get ~ 1 sec avg query times (5 shard and one HUGE monolithic index having ALL 90 days data).
Since ideal shard size is 30 - 60 GB, should we have weekly indexes (instead of one HUGE monolithic 90 days index) having 5 shards each?
The index JSON comprises of 16 fields, mosttly text and 2 keywords and 1 long. Will ES be able to perform? Everyone speaks to highly of ES, yet when I begin to ask questions I get open ended answers ("It can, but depends on the Hardware"), which is kind of diappointing.
Can someone advise - maybe talk about their their own story/journey? Thanks all!