Scaling issues

thosfelt · May 30, 2017, 4:26pm

I recently built a cluster with 29 hosts. 3 of the hosts have dual 20 core CPUs with 512Gb of memory. On each of those 3 hosts I set up a master instance with 32Gb/mem and 2 CPUs and 4 ingest instances with 32Gb/mem and 4 CPUs. The remaining 26 hosts have dual 14-core CPUs with 512Gb of memory and 6 1.6Tb SSDs. Each of those hosts have 3 data instances with 7 32Gb/mem and 8 CPUs with two of the SSDs dedicated to each instanace. Using a stress test tool written in Python I get about 500K docs/sec (10Tb/hour) throughput. I'm fairly happy with those results, but I have 78 spare hosts (for a short while) with the same spces as the 26 I'm using for compute nodes so I wanted to see how ES scaled. I set up the remaining 78 nodes like the data nodes. The problem is that the cluster went to crap. I'm now only getting about 30K docs/sec when scaling - I'm seeing a lot of nodes drop out of the cluster and then rejoin. Eventually one of the master nodes will fail and the cluster basically freezes.

I'm also running the same stress test between the 78 data node cluster and the 312 data node cluster, but he 312 data node cluster fails miserably. I'm not sure where to look at this point.

system · June 27, 2017, 4:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES Scalability Issues Elasticsearch	2	291	July 6, 2017
ES cluster throughput drops with 6 node cluster Elasticsearch	5	522	April 16, 2020
Is the input speed of the elasticsearch cluster linear? Elasticsearch	9	1345	July 6, 2017
ESRally Benchmarks - More nodes = Less throughput? Elasticsearch	6	589	November 28, 2018
ES does not scale with Rally track http_logs Elasticsearch rally	2	749	June 27, 2018

Scaling issues

Related topics