Can elastic support these requirments

Hi all,
Im now considering which DB i should use and elastic is a strong candidate.

I have few requirements im not sure if ES can handle, i would appreciate if someone with experience can assure me it can be done before choosing ES as my solution.

Requirements:
200K document inserts per second ( Optional - need to ignore already existing docs ( i know i can filter the duplicates on the query))
a cluster that will hold 540 Billion documents
Average document size 100 bytes, largest possible size 300 bytes.
Queries will return Apporx 1.5 Million docs as result, Maximum query time- 10 Seconds
Maximum concurrent queries - 10

My machine Specs-
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
256G RAM
disk space 21TeraByte

What i want to know?

  1. can ES provide the restrictions i mentioned performance wise, with the machine specs i wrote?
  2. i can scale out and add more machines like i wrote

If anyone needs any clarification/ more information in order to help me, please let me know.

Thanks in advance,
Sharon.

At this scale, I would probably ask for elastic support.

Some few numbers, we have customers injecting 10m docs per second. Not on a single node for sure.
We have customers who have more than 1000 billion of docs in their cluster. Again not with a single node.

Yes you can (you should) scale out.

I’d recommend looking at

HTH

Hi, not sure i understood, are you plan to have only one machine?
if so, it's bad idea and ES will not make you happy here.

first) You need several Nodes, with good IO (so choose SSDs)

Read this as starting point to understand HW requirements.

second) Please explain how "200K document inserts per second " all time? or some times?

I didnt mean only one node.
I will use ES inside a docker container and will create as many as i need.

As you can see my default machine is pretty strong, i will give ES the resources a single node needs in his docker container, and will set as many nodes as you recommend. i said i can scale out for the scenario i will need, say for example, 3 nodes on this current machine ( if one node will need 1/3 the resources my default machine has), just to emphasise if the recommendation will be to 6 nodes with each node having 1/3 resources of a machine, i have no problem adding another or even a dozen.

About your second clarification,
200K per second is the peak ill have in a day.
I expect to get around 3 Billion a day.
so in average its 11K per second.
200k per second will be peak performance for each day, i assume most of it will be less

Thanks for the responses, i will read the arctile and watch the clip @dadoonet

Ok.

First things to check, wether your machines can be equiped with SSD drives... several per maschine.
with ssd drives you can achive more than 10000/s per Hardware node... and maybe more... but it depend on many things...
Here look at discussion with spinning disks.:

Let's assume you will make 20000/s per Hardware node (2 Elasticsearch instances in Docker) then you need at least 10 Hardware nodes to deal with your peaks... But 200.000 /s is huge number, so much other apps and custom tools can have problem before Elasticsearch...

But again 2 billions a day is huge number, in best case you start with 1/10 of this on smaller set and test how it behaves and scalles...

Check this:

More questions

  • what happens if you're cluster can't index 200.000 for 2 minutes, and starts rejecting bulk index request? How does your application react? Is it custom app?
    Logstash for example can deal with that...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.