Huge database I'm wondering if Elasticsearch can handle



I need to store a maximum of 200 billion records, each having 2 fields: A, and B, both strings of no more than 255 characters.
I'll be inserting records into this database at a pace of about 50 000 per second.
About once a second, I'll also be querying the database. All the queries will be the same: I'll need all the records where the field A=X, for a given X string.

  1. Is it possible to use Elastic Search to store such a database?
  2. What kind of hardware would I need to store it?


(Magnus Bäck) #2

Is this the only way you're going to use the database? Asking the whole dataset the same question every second isn't efficient since you only need to consider the data that has been inserted since the last time. I'd consider pushing the records to a broker and have a program process each new record and keep track of how many there are with A=X. If events expire one would have to deal with that in some way too.

(Mark Walkom) #3

ES can handle this, using filtered queries will be the optimal way too.


I wouldn't be asking the exact same question. X is different on every query.

(Otis Gospodnetić) #5

Yes, ES should be fine with this.... if you give it enough/appropriate hardware, of course. Super small/narrow docs and the cheapest possible queries. Just don't go updating a large number of docs at once and try to avoid shard reallocation.


Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Elasticsearch Consulting & Support *

(system) #6