Hi all,
Im now considering which DB i should use and elastic is a strong candidate.
I have few requirements im not sure if ES can handle, i would appreciate if someone with experience can assure me it can be done before choosing ES as my solution.
Requirements:
200K document inserts per second ( Optional - need to ignore already existing docs ( i know i can filter the duplicates on the query))
a cluster that will hold 540 Billion documents
Average document size 100 bytes, largest possible size 300 bytes.
Queries will return Apporx 1.5 Million docs as result, Maximum query time- 10 Seconds
Maximum concurrent queries - 10
My machine Specs-
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
256G RAM
disk space 21TeraByte
What i want to know?
can ES provide the restrictions i mentioned performance wise, with the machine specs i wrote?
i can scale out and add more machines like i wrote
If anyone needs any clarification/ more information in order to help me, please let me know.
At this scale, I would probably ask for elastic support.
Some few numbers, we have customers injecting 10m docs per second. Not on a single node for sure.
We have customers who have more than 1000 billion of docs in their cluster. Again not with a single node.
I didnt mean only one node.
I will use ES inside a docker container and will create as many as i need.
As you can see my default machine is pretty strong, i will give ES the resources a single node needs in his docker container, and will set as many nodes as you recommend. i said i can scale out for the scenario i will need, say for example, 3 nodes on this current machine ( if one node will need 1/3 the resources my default machine has), just to emphasise if the recommendation will be to 6 nodes with each node having 1/3 resources of a machine, i have no problem adding another or even a dozen.
About your second clarification,
200K per second is the peak ill have in a day.
I expect to get around 3 Billion a day.
so in average its 11K per second.
200k per second will be peak performance for each day, i assume most of it will be less
Thanks for the responses, i will read the arctile and watch the clip @dadoonet
First things to check, wether your machines can be equiped with SSD drives... several per maschine.
with ssd drives you can achive more than 10000/s per Hardware node... and maybe more... but it depend on many things...
Here look at discussion with spinning disks.:
Let's assume you will make 20000/s per Hardware node (2 Elasticsearch instances in Docker) then you need at least 10 Hardware nodes to deal with your peaks... But 200.000 /s is huge number, so much other apps and custom tools can have problem before Elasticsearch...
But again 2 billions a day is huge number, in best case you start with 1/10 of this on smaller set and test how it behaves and scalles...
Check this:
More questions
what happens if you're cluster can't index 200.000 for 2 minutes, and starts rejecting bulk index request? How does your application react? Is it custom app?
Logstash for example can deal with that...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.