ElasticSearch performance with some real world values

Vespira · April 13, 2017, 3:35pm

Hi everyone,

My name is Alex and I'm a fullstack developer. I have to host an ElasticSearch cluster which main role will be to serve data for Java client applications. I've noticed that a lot of things are well documented for ES, but I didn't found much explanations when it came to defining VM hardware config and cluster configuration to address needs.

However, I have to say that this article is well written and made my vision more clear about that : https://www.elastic.co/blog/found-sizing-elasticsearch

Let me explain you what we are expecting

We have an index of less than 1 millions of documents.
We will have 3 VMs available for the project (1 which will act as a reverse proxy to authenticate queries through HAProxy, and 2 which will host each an ES node).

In theory, we have a bit more than 9000 clients apps in the wild, and they will be making to the maximum, 100 requests per day, on a 12h time range. So :
9000 * 100 = 900 000
900 000/12 = 75 000 requests/hour -> 75 000/3600 = 21 requests/second (max)

The queries that we will do are basic search operation (no aggregations, no intelligence, ...) : only full-text search and wildcard on one or more fields.

We plan to build our VM for ES to : 4GB RAM, 2-core classic CPU and 20 GB of storage, running under CentOS. These will be dedicated only to run ES.

Are this config coherent with our approx. max. load ? How many primary and replica shards should we create for the index ? is 2 nodes not concerned by the split-brain issue if our HAProxy do the loadbalancing and the failover on a highest level ?

Thanks a lot for reading this and for your pro tips !
PS: I ElasticSearch, but I'm quite a noob when it comes to build a complete cluster optimized for our need)

warkolm · April 17, 2017, 5:57am

That's not ideal, you want 3 nodes for HA and redundancy.

The best way to tell would be to build your cluster and then run GitHub - elastic/rally: Macrobenchmarking framework for Elasticsearch with your data and queries.

Depends, but with a dataset that small you can run a complete set on each node.

Nope, see my initial comment.

Vespira · April 19, 2017, 9:04am

Hi Mark,

Thanks for the response, with your message and my researchs, my overview of this project has came to a maturity. I went for 2 Haproxy VMs and 3 ES VM's for my total cluster.

And I will give a shot to Rally, I was trying to configure Burp Proxy to do my tests, but it's kind of a pain in the ass with the security layer I've added.

Have a nice day,
Alex

system · May 17, 2017, 9:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on cluster configuration Elasticsearch	10	553	January 8, 2019
Cluster sizing Elasticsearch	7	407	July 6, 2017
Cluster optimization(indexing/query performace) Elasticsearch	4	312	July 6, 2017
Scaling Elasticsearch for 40GB of data Elasticsearch	5	1193	July 6, 2017
Maximize read/write throughput Elasticsearch	11	7249	October 1, 2019

ElasticSearch performance with some real world values

Related topics