Scaling up for petabyte sizes?


Had a performance problem: ES queries become really slow when dataset size grew to several petabytes... What approach can I use to scale up for larger datasets while preserving original data? For example: is it possible to increase number of primary shards for running ES?
Thank you,

(Christian Dahlqvist) #2

How much data did you have? How many nodes? Did you identify what was limiting performance (CPU, memory, network, disk)?


3 nodes / 3 primary shards. Don't see resource over utilization(s) as such, probably the most limiting one is disk usage - ES data takes ~80% of available disk space

(Christian Dahlqvist) #4

How is this related to the 3 node cluster?


Sorry, not sure i understand


Here's the general question: is it possible to increase cluster/sharding size while preserving existing data?

(Christian Dahlqvist) #7

You said you had performance problems when the dataset grew to several petabytes. That is clearly not possible with the 3 nodes you then mentioned, which leaves me confused.


Sorry, my bad. I mean 3 replicas

(Christian Dahlqvist) #9

I still do not understand. Can you please clarify? How much data did you have in the cluster? How many nodes were used?


close to 2 petabytes of data, 3 nodes.

(Christian Dahlqvist) #11

That is not possible. Are you mixing up your units? Is it by any chance 2 terabytes?

Easiest way to determine the data amount id probably to provide us the full output of the cluster stats API.


Why is it not possible? Is there some limit?

(Christian Dahlqvist) #13

Please provide the output from the API I linked to.

(Mike Barretta) #14

@lvic if you just want to know how to change the shard count of an existing index, see:

(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.