Querying 5.4 Billion JSON Documents occupying 2 TB - is it possible in Elasticsearch?

  1. Ingesting once a day (not real-time) 60 million JSON documents via a once run batch script
  2. Elasticsearch to query 90 days worth of data for a total of 5.4 billion documents
  3. Occupies about 2 TB of storage overall (23 GB per day and 161 GB a week)

Is it possible to get good query times with this in Elasticsearch? On a non-SSD cluster (6 nodes) we get ~ 1 sec avg query times (5 shard and one HUGE monolithic index having ALL 90 days data).

Since ideal shard size is 30 - 60 GB, should we have weekly indexes (instead of one HUGE monolithic 90 days index) having 5 shards each?

The index JSON comprises of 16 fields, mosttly text and 2 keywords and 1 long. Will ES be able to perform? Everyone speaks to highly of ES, yet when I begin to ask questions I get open ended answers ("It can, but depends on the Hardware"), which is kind of diappointing.

Can someone advise - maybe talk about their their own story/journey? Thanks all!

"It can, but depends on the Hardware" is really the only correct response here. With one underpowered node, slow disks and minimal RAM you will see poor performance. With better hardware you'll get better performance. Careful benchmarking of your actual workload will help you find the sweet spot, but it really does depend on the workload and the available hardware.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.