High performance

How can I increace high performance of ES? Is this possible to do high performance cluster? or cluster give me "only" data security / availability? Most important for me is not data stored in ES, but abiliti doing queries on last data.

Additional question, how to measure EPS ability of my ELK stack or something like it, how "strong" is my cluster?

Currently I am using 2*12 thread CPUs, 64 GB ram and SSD drive.

How "slow" it is today?
What is the target number you must meet?

Do you have any numbers, query and response to share here?

To be honest I tested with "worst" PC:
12 cores, 16 GB ram
and with same data I have "visible" better performance with this
24 cores, 64 GB ram.

So I am just trying figureout what to do when I will have much more EPS.

If you really have search issues when you get more and more queries, you can increase the read throughput by adding more replicas to your index. More nodes that means.

Like if you have a 10 nodes cluster, you can set the number of replicas to 9.

Are you planning to run your service on your own machines or on internet? If the later, I'd just try cloud.elastic.co and see how fast it works for you.

Of course, there are many source of optimizations, like having the best mapping for your use case, writing queries in an efficient way... But I can't really tell more here because I have no idea of what you are doing.

May I suggest you look at the following resources about sizing? Although old, it is still very accurate.

And Using Rally to Get Your Elasticsearch Cluster Size Right | Elastic Videos

Finally, you can attend one of our trainings:

Thanks for all this information.

If I have full E+L+K stack on single server, can I do second E+L stack on different sever and it will work like this? ->

  • on first server L is reporting to local E and (local) K is used to view dashboards etc
  • on second server L is report to local E, somehow (cluster?) it is connected to E from first server and share data, so it is possible to view data from both E on this K from first server?

Sorry for entry level questions but I still not fully understand how deal with this clusters.

Ideally one single elasticsearch instance per machine and no other service running on it.
As many nodes as required by your use case.

All nodes form a cluster.

Kibana on another machine. It can speak to whatever nodes of the cluster.

But have a look at the links i shared. I think that most of answers are there.

What are you using Logstash for?

With Logstash I am parsing own logs in JSON to ES.

If I will have 2 x ES in cluster, can I have for example in each ES node dedicated indexes generated by local Logstash like: index-1 in ES #1 instance and index-2 in ES #2 instance, and both will be visible in both ES when connected to one cluster?

Did you try to do that with an ingest pipeline instead?

You must have 3 nodes at least to avoid split brain issues.

Then elasticsearch will decide on which the data will be allocated whatever the node you are connected to.

So it is better NOT to do cluster if I have only 2 servers and just use one ES in my ELK stack?

If you don't care about High Availability and data integrity, you can use onde node only or 2.

Quote from docs:

High availability (HA) clusters require at least three master-eligible nodes, at least two of which are not voting-only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.

You only need three nodes for the high availability side of this. All sizes of cluster will protect the integrity of your data, but might reject some requests if the cluster is not HA and a node fails.

Thanks.

I meant that in the past, when the cluster is split (split brain) and your application sends data to one or the other node, you might have indices which contained not the same data.

That might not be the case nowadays anymore.

Thanks for information!

But in matter of better performance, is there some diffrence between if I will have:

  • server #1: full ELK stack
  • server #2: only Logstash who will report to ES on #1 server
    or
  • server #1: full ELK stack
  • server #2: Logastash + ES in cluster with ES from #1 server

I need have logstash which software take care about my data and parse/transport it to ES.

Some of this solutions is better for performance? I think second approach in theory is better but in fact it can not be big diffrence. Additionaly this "problem of only two nodes" can be worst than little less performance according to first solution.

Please advice :), because currently I can have only 2 x ES, if I will can have 3 x ES then will have cluster for sure :).

I tottaly understand problem of high AVABILITY - this is like with MySQL databases. But my biggest problem is PERFORMANCE :). If I will lost data, it will not be worst - my system is for real time analysis, old data will gone, we will live collect new - this is fine in critical situation like some problem with server etc.

So my question is more for performance purposes. Is (much) better to make cluster of 2 ES than have only one alone ES in this case?:slight_smile:

If you are looking for performance, you need at least to make sure that only Elasticsearch runs on the machine you have. Don't run any other service on the same machine.
You can start with a single node and check how good the performance is for your use case.