Scaling up for petabyte sizes?


#1

Had a performance problem: ES queries become really slow when dataset size grew to several petabytes... What approach can I use to scale up for larger datasets while preserving original data? For example: is it possible to increase number of primary shards for running ES?
Thank you,


(Christian Dahlqvist) #2

How much data did you have? How many nodes? Did you identify what was limiting performance (CPU, memory, network, disk)?


#3

3 nodes / 3 primary shards. Don't see resource over utilization(s) as such, probably the most limiting one is disk usage - ES data takes ~80% of available disk space


(Christian Dahlqvist) #4

How is this related to the 3 node cluster?


#5

Sorry, not sure i understand


#6

Here's the general question: is it possible to increase cluster/sharding size while preserving existing data?


(Christian Dahlqvist) #7

You said you had performance problems when the dataset grew to several petabytes. That is clearly not possible with the 3 nodes you then mentioned, which leaves me confused.


#8

Sorry, my bad. I mean 3 replicas


(Christian Dahlqvist) #9

I still do not understand. Can you please clarify? How much data did you have in the cluster? How many nodes were used?


#10

close to 2 petabytes of data, 3 nodes.


(Christian Dahlqvist) #11

That is not possible. Are you mixing up your units? Is it by any chance 2 terabytes?

Easiest way to determine the data amount id probably to provide us the full output of the cluster stats API.


#12

Why is it not possible? Is there some limit?


(Christian Dahlqvist) #13

Please provide the output from the API I linked to.


(Mike Barretta) #14

@lvic if you just want to know how to change the shard count of an existing index, see:


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.