Let us assume we create index with 2 shards and 1 replica on two
nodes in a elasticsearch box1.After indexing some documents,disk
space(say 1000GB) in the box1 will get filled. To index more
documents, I will have 2 more nodes(one more box say box2 has disk
space 1000GB). What is the mechanism to use total 4 nodes(two
boxes) with only one index having memory 2000GB?
What I am thinking is like If we create 2 more primary shards to
the box1 dynamically,then created shards will get allocated on the
second box.To do this I want to know how to create shards
Apart from creating dynamic shards,any other mechanism to scale?
As David mentioned, shards are not configurable after index
creation. You have a few options, and you may want to do all of them
The first is to create your index with more shards. This will give
you more horizontal node growth. You can't store a 10TiB index on
100 1TiB nodes if you only have 2 shards, because each shard would be
5TiB. So, calculate a reasonable number given the size of the
resources you have available.
The next tool at your disposal when you've exhausted the previous
step's capacity are aliases. You'll want to roll over into a new
index and create an alias that points to the old and new index. You
don't even have to do this step if your older data doesn't need to be
queried seamlessly. You could either disable it or leave it to be
queried manually, but an alias allows you to query both indices with
no client knowledge that they're separate. You can also set up alias
filters to enforce logical partitions of data.
At some point you'll exhaust the limits of that cluster. The only
option then will be to create another cluster and either migrate data
to it or use both clusters together. We do the latter through load
balancers with great success, managing 20 clusters of 1PiB of data.