Im now considering which DB i should use and elastic is a strong candidate.
I have few requirements im not sure if ES can handle, i would appreciate if someone with experience can assure me it can be done before choosing ES as my solution.
200K document inserts per second ( Optional - need to ignore already existing docs ( i know i can filter the duplicates on the query))
a cluster that will hold 540 Billion documents
Average document size 100 bytes, largest possible size 300 bytes.
Queries will return Apporx 1.5 Million docs as result, Maximum query time- 10 Seconds
Maximum concurrent queries - 10
My machine Specs-
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
disk space 21TeraByte
What i want to know?
can ES provide the restrictions i mentioned performance wise, with the machine specs i wrote?
i can scale out and add more machines like i wrote
If anyone needs any clarification/ more information in order to help me, please let me know.
At this scale, I would probably ask for elastic support.
Some few numbers, we have customers injecting 10m docs per second. Not on a single node for sure.
We have customers who have more than 1000 billion of docs in their cluster. Again not with a single node.
I didnt mean only one node.
I will use ES inside a docker container and will create as many as i need.
As you can see my default machine is pretty strong, i will give ES the resources a single node needs in his docker container, and will set as many nodes as you recommend. i said i can scale out for the scenario i will need, say for example, 3 nodes on this current machine ( if one node will need 1/3 the resources my default machine has), just to emphasise if the recommendation will be to 6 nodes with each node having 1/3 resources of a machine, i have no problem adding another or even a dozen.
About your second clarification,
200K per second is the peak ill have in a day.
I expect to get around 3 Billion a day.
so in average its 11K per second.
200k per second will be peak performance for each day, i assume most of it will be less
Thanks for the responses, i will read the arctile and watch the clip @dadoonet
First things to check, wether your machines can be equiped with SSD drives... several per maschine.
with ssd drives you can achive more than 10000/s per Hardware node... and maybe more... but it depend on many things...
Here look at discussion with spinning disks.:
Let's assume you will make 20000/s per Hardware node (2 Elasticsearch instances in Docker) then you need at least 10 Hardware nodes to deal with your peaks... But 200.000 /s is huge number, so much other apps and custom tools can have problem before Elasticsearch...
But again 2 billions a day is huge number, in best case you start with 1/10 of this on smaller set and test how it behaves and scalles...
what happens if you're cluster can't index 200.000 for 2 minutes, and starts rejecting bulk index request? How does your application react? Is it custom app?
Logstash for example can deal with that...