I need to store a maximum of 200 billion records, each having 2 fields: A, and B, both strings of no more than 255 characters.
I'll be inserting records into this database at a pace of about 50 000 per second.
About once a second, I'll also be querying the database. All the queries will be the same: I'll need all the records where the field A=X, for a given X string.
Is it possible to use Elastic Search to store such a database?
Is this the only way you're going to use the database? Asking the whole dataset the same question every second isn't efficient since you only need to consider the data that has been inserted since the last time. I'd consider pushing the records to a broker and have a program process each new record and keep track of how many there are with A=X. If events expire one would have to deal with that in some way too.
Yes, ES should be fine with this.... if you give it enough/appropriate hardware, of course. Super small/narrow docs and the cheapest possible queries. Just don't go updating a large number of docs at once and try to avoid shard reallocation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.