Sharding is important with small dataset?

diego.bernardes · June 23, 2017, 4:16pm

My dataset is very small (+- 7gb) and our machines are big enough to keep everything on memory (32gb machines), but we do tons of queries on this dataset. I'm trying to find the the best cluster setup for this scenario.

Should we keep the default 5 shard configuration or go down to 1 shard? The shard is important if your server can't keep all the dataset on memory, right? In our case, we can keep on memory.

Or should i keep the 5 shards, but stay with them in all machines?

The idea is if everything in the same shard, we has less wast of time with shards data merging during query.

dadoonet · June 23, 2017, 4:32pm

I'd go with 1 shard only (and test) but I'd increase the number of replicas so your cluster will be able to serve more distributed searches.

diego.bernardes · June 23, 2017, 4:42pm

Gonna do that, and yes, today we have several replicas, but with lots of shards and this small dataset.

Thanks!

system · July 21, 2017, 4:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.