Cluster Horizontal Scaling

Zhengcong_Yin · November 28, 2017, 5:33pm

ES is designed for horizontal scaling. Does it mean more node add to the cluster, the search performance could performance faster?
I executed the same query on a single node and a cluster contains 3 master, 4 data, 1 client nodes. However, the search query seems slower when taking a look at the "took" in the response of each query.

Really appreciate any help

Zhengcong

Christian_Dahlqvist · November 28, 2017, 5:52pm

This will depend on what limits the performance of your query and whether it is able to benefit from the additional system resources available in the cluster. When you are talking performance, are you referring to query latency or query throughput?

What is your setup? How much data? How many shards? What type of queries?

Zhengcong_Yin · November 28, 2017, 7:13pm

Thanks for your prompt reply.

I have 43G address data with Xeon E5 2.2GHZ CUP, 16G RAM, and regular hard drive. I used them to do geocoding query(bool query firstly filter match with postal code and house number as keyword, then match with street name as a string ), K-nearest neighbor from a given latitude/longitude and aggregation by geohash.

I created 4 shards, no replica. And for the four data node cluster, each node contains one shard. I track the "took" parameter for each query, and found the average response time of 1 million queries is slower than that on a single node(both as master, data, and client node).

Christian_Dahlqvist · November 28, 2017, 7:21pm

With more an increasing number of shards and nodes, there is more data that will need to be moved between the nodes, and it may be that this slows down the latency. I would however accept you to be able to handle a higher concurrent query throughput with the larger cluster.

Benchmark with as realistic load as you can (type and volume). You can tune the cluster differently depending on whether you are looking for low latency of few concurrent queries compared to very high number of concurrent queries.

Zhengcong_Yin · November 28, 2017, 7:30pm

Actually, my data is kind of static, once I indexed it, I will not add more to it.

What if I index more data to both single node and the cluster, the single node would be slower than the cluster one?

Can I say the cluster cannot improve the single query time, however, it would increase the capability to handle the high concurrency query? By the way, for the cluster do I need to change the thread pool setting in order to improve the concurrency capability?

I really appreciate your time and efforts regarding this.

system · December 26, 2017, 7:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster optimization(indexing/query performace) Elasticsearch	4	312	July 6, 2017
Is there a source can explain deatiled thing about shards, nodes, clusters for better index and query for ELK? Elasticsearch	5	380	July 5, 2017
Search performance - Scaling options Horizontally vs Vertically Elasticsearch	6	3881	July 6, 2017
Search performance - Scaling options Horizontally vs Vertically Elasticsearch	1	417	July 6, 2017
Concurrent Search in elasticsearch Elasticsearch	7	2138	July 5, 2017

Cluster Horizontal Scaling

Related topics