Node Local Storage and Gateway Storage

Mihir_Patel · January 18, 2012, 6:47am

Hello everyone,

I am not sure if this question has been asked before but wanted to get
fresh answer if anything has changed recently.

I am exploring options for search application with large index (few
TB). We would have couple or few indexes (e.g. books, videos,
articles, other web resources). I looked at node storage (http://
www.elasticsearch.org/guide/reference/index-modules/store.html) and
gateway storage (http://www.elasticsearch.org/guide/reference/modules/
gateway/) but would like to clarify couple scenario's/approaches.
Could someone please help me with answers to following questions?

Is it possible to store indices data locally and still have gateway
persistence to S3 for long term persistency?
What's the benefit of using multiple indexes with respect to index
size? Just from the search performance (for large index data)
perspective, which option is better, a) use single flat index or b)
use multiple indices and for those rare cases where you do really need
to search on multiple indices.
How does sharding work when adding and removing nodes? Consider
following scenario
a. Started cluster with 1 node, with 1 index with 5 shards and 1
replication
b. Index documents (let's assume that all shards do get some
data), but we still have single node, so it will hold all shards,
cluster status will be yellow at this point
c. Now, we add new node to cluster. Would replication farm be
transferred over to new node or some of 5 shards or both? Will this
bring cluster status to green?
d. Now, let's assume that we add 3 more nodes, at this point we
have total 5 nodes. Will cluster level be green with 5 nodes or we
absolutely need MAX (shard) * MAX (replication) nodes to have cluster
node to green?
e. Now, we add 5 more nodes, we have total 10 now. So, at this
point, will every node have single shard (5 primary + 5 secondary/
replica)? How is this calculated?
f. In what scenario we would loose shards (let's assume we are
using local gateway)? If I continue bringing down each node, and go
back to single node, can I go back to cluster state yellow (point b)
without loosing any shards/data?

Appreciate your help.

Thanks,
Mihir

kimchy · January 18, 2012, 9:18pm

On Wed, Jan 18, 2012 at 8:47 AM, Mihir Patel exploremihir@gmail.com wrote:

Hello everyone,

I am not sure if this question has been asked before but wanted to get
fresh answer if anything has changed recently.

I am exploring options for search application with large index (few
TB). We would have couple or few indexes (e.g. books, videos,
articles, other web resources). I looked at node storage (http://
Elasticsearch Platform — Find real-time answers at scale | Elastic) and
gateway storage (Elasticsearch Platform — Find real-time answers at scale | Elastic
gateway/) but would like to clarify couple scenario's/approaches.
Could someone please help me with answers to following questions?

Is it possible to store indices data locally and still have gateway
persistence to S3 for long term persistency?

Its not an option, thats how it works. Nodes still hold local storage of
what is stored on S3, spread across the nodes.

What's the benefit of using multiple indexes with respect to index
size? Just from the search performance (for large index data)
perspective, which option is better, a) use single flat index or b)
use multiple indices and for those rare cases where you do really need
to search on multiple indices.

Searching on 1 index with 100 shards is the same as searching across 100
indices with 1 shard.

How does sharding work when adding and removing nodes? Consider
following scenario
a. Started cluster with 1 node, with 1 index with 5 shards and 1
replication
b. Index documents (let's assume that all shards do get some
data), but we still have single node, so it will hold all shards,
cluster status will be yellow at this point
c. Now, we add new node to cluster. Would replication farm be
transferred over to new node or some of 5 shards or both? Will this
bring cluster status to green?

Yes, 5 replicas will be allocated to the other node.

d. Now, let's assume that we add 3 more nodes, at this point we
have total 5 nodes. Will cluster level be green with 5 nodes or we
absolutely need MAX (shard) * MAX (replication) nodes to have cluster
node to green?

the cluster will be green. More over, the 10 shards you have (5 shard + 1
replica each) will be spread across the 5 nodes now. Use cluster state API,
you will see where and what is allocated where. Or install the
elasticsearch head plugin.

e. Now, we add 5 more nodes, we have total 10 now. So, at this
point, will every node have single shard (5 primary + 5 secondary/
replica)? How is this calculated?

Yes. Teh calc is to aim to have an even number of shards per node.

f. In what scenario we would loose shards (let's assume we are
using local gateway)? If I continue bringing down each node, and go
back to single node, can I go back to cluster state yellow (point b)
without loosing any shards/data?

If you bring down one node, then the shards allocated on it will now start
ot be allocated on the rest of the cluster. Once it hits green, you can
bring down another node. Note, even if you loose two nodes, and a shard and
a replica were on both them, but still can bring them back (at least with
the same data), then the relevant shard will be reallocated.

Appreciate your help.

Thanks,
Mihir

Topic		Replies	Views
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
Shared Data-Store for multiple Nodes Elasticsearch	3	1865	March 17, 2020
Elasticsearch Index shards per nodes Elasticsearch	13	1232	October 5, 2020
Shard Elasticsearch	6	297	July 6, 2017
Sharding and Performance Elasticsearch	1	310	August 29, 2018

Node Local Storage and Gateway Storage

Related topics