ElasticSearch performance with some real world values

Hi everyone,

My name is Alex and I'm a fullstack developer. I have to host an ElasticSearch cluster which main role will be to serve data for Java client applications. I've noticed that a lot of things are well documented for ES, but I didn't found much explanations when it came to defining VM hardware config and cluster configuration to address needs.

However, I have to say that this article is well written and made my vision more clear about that : https://www.elastic.co/blog/found-sizing-elasticsearch

Let me explain you what we are expecting :smile:

  • We have an index of less than 1 millions of documents.
  • We will have 3 VMs available for the project (1 which will act as a reverse proxy to authenticate queries through HAProxy, and 2 which will host each an ES node).

In theory, we have a bit more than 9000 clients apps in the wild, and they will be making to the maximum, 100 requests per day, on a 12h time range. So :
9000 * 100 = 900 000
900 000/12 = 75 000 requests/hour -> 75 000/3600 = 21 requests/second (max)

The queries that we will do are basic search operation (no aggregations, no intelligence, ...) : only full-text search and wildcard on one or more fields.

We plan to build our VM for ES to : 4GB RAM, 2-core classic CPU and 20 GB of storage, running under CentOS. These will be dedicated only to run ES.

Are this config coherent with our approx. max. load ? How many primary and replica shards should we create for the index ? is 2 nodes not concerned by the split-brain issue if our HAProxy do the loadbalancing and the failover on a highest level ?

Thanks a lot for reading this and for your pro tips !
PS: I :heart: ElasticSearch, but I'm quite a noob when it comes to build a complete cluster optimized for our need)

That's not ideal, you want 3 nodes for HA and redundancy.

The best way to tell would be to build your cluster and then run GitHub - elastic/rally: Macrobenchmarking framework for Elasticsearch with your data and queries.

Depends, but with a dataset that small you can run a complete set on each node.

Nope, see my initial comment.

1 Like

Hi Mark,

Thanks for the response, with your message and my researchs, my overview of this project has came to a maturity. I went for 2 Haproxy VMs and 3 ES VM's for my total cluster.

And I will give a shot to Rally, I was trying to configure Burp Proxy to do my tests, but it's kind of a pain in the ass with the security layer I've added.

Have a nice day,
Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.