I am not sure if this question has been asked before but wanted to get
fresh answer if anything has changed recently.
I am exploring options for search application with large index (few
TB). We would have couple or few indexes (e.g. books, videos,
articles, other web resources). I looked at node storage (http://
gateway storage (http://www.elasticsearch.org/guide/reference/modules/
gateway/) but would like to clarify couple scenario's/approaches.
Could someone please help me with answers to following questions?
- Is it possible to store indices data locally and still have gateway
persistence to S3 for long term persistency?
- What's the benefit of using multiple indexes with respect to index
size? Just from the search performance (for large index data)
perspective, which option is better, a) use single flat index or b)
use multiple indices and for those rare cases where you do really need
to search on multiple indices.
- How does sharding work when adding and removing nodes? Consider
a. Started cluster with 1 node, with 1 index with 5 shards and 1
b. Index documents (let's assume that all shards do get some
data), but we still have single node, so it will hold all shards,
cluster status will be yellow at this point
c. Now, we add new node to cluster. Would replication farm be
transferred over to new node or some of 5 shards or both? Will this
bring cluster status to green?
d. Now, let's assume that we add 3 more nodes, at this point we
have total 5 nodes. Will cluster level be green with 5 nodes or we
absolutely need MAX (shard) * MAX (replication) nodes to have cluster
node to green?
e. Now, we add 5 more nodes, we have total 10 now. So, at this
point, will every node have single shard (5 primary + 5 secondary/
replica)? How is this calculated?
f. In what scenario we would loose shards (let's assume we are
using local gateway)? If I continue bringing down each node, and go
back to single node, can I go back to cluster state yellow (point b)
without loosing any shards/data?
Appreciate your help.