More shards = more throughput?

I've been trying to better understand the effect sharing has on performance but the theory doesn't seem to match the reality so clearly one of those things must be wrong :wink:

After watching this video on Quantitative Cluster Sizing where an increase to the number of shards seems increase throughput -- which in my limited understanding seems to make sense.

My setup is about as vanilla/reliable as I can go: 3 node cluster running on elastic.co (compute optimised) and using a k6 loadtest client running a simple loop to post new documents as fast as it can.

I start the test with a single partition index and push the script to the point where response times start to degrade. I then delete the index and recreate it with two partitions.

What seems to happen is that a 2-partition index consistently performs no better (and sometimes worse) than the single partition. This seems counter intuitive to me but perhaps I am missing something?

Thanks!

How many concurrent connections are you using? What is your bulk size? What is the size of your documents? Do you have 3 data nodes in your cluster or 2 data nodes plus an arbiter?

There's no bulking as such, I'm just making a single index request over http as fast as the k6 loadtest tool will allow me to do. I realise this isn't the optimal way but was curious to see how far this could get me.

Is an arbiter equivalent to master? I have a three nodes across three AWS zones where one of them is a master (and one is called a Tiebreaker.)

Thanks

If you are sending single documents using a single thread you are likely limited by the load testing tool rather than Elasticsearch, which is probably why you will get the same results no matter how many shards you have. I would recommend setting up a more realistic test and suspect this webinar might be useful.

When it comes to optimal number of shards the results will vary depending on use case, but I would expect one or two active shards per node would be sufficient to saturate throughput and would expect throughput to potentially start decreasing after that. More active shards does in general not result in better throughput.

Agreed that the tool is probably part of the reason but noticed that when I increased the shards I did see a proportional increase in search throughput so it seems the cluster is at least partially responsible.

In the below image I ran the same test with 1 and 2 shards respectively and you can see how searches goes up but indexing actually goes down. I thought it might be my local disk that was the bottleneck which I why I wanted to try the same test on the official Elastic cluster but I found I got the same results.

Do you have any thoughts on why this might be?

Thank you for the link I will definitely watch the webinar a bit later!

Search scales differently compared to indexing. Queries against multiple shards can be executed in parallel which means having 2 shards is likely to give a performance gain compared to a single primary shard, at least for low query concurrency.

If that testing tool does not do multiple concurrent requests, its not testing a realistic scenario.

Once you try concurrent single document writes/indexing, you are going to hit a limit with the indexing buffer far sooner than you would think. At that point, if you cant tune the index buffer, the only other options are to bulk index everything possible, and if that or throwing more hardware at it isnt enough. stick a load limiting/regulating queue or service in front of ES to receive and batch up the requests to ES

If that testing tool does not do multiple concurrent requests

Yes k6 performs concurrent http requests.

you are going to hit a limit with the indexing buffer far sooner than you would think

Perhaps I am hitting this buffer but isn't that a per node limitation? My query is around comparing the throughput of a single node/single shard with two.

Indexing single documents is quite inefficient as data is synced to disk per request which results in a lot of disk I/O, which is often what limits Indexing throughput in Elasticsearch.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.