I'm building a very large number of ES indices. I'm trying to decide
whether to federate the data into ES indinces, stored one per AWS EBS
volume or whether to use an S3 gateway.
When I run a query, I plan to use an index alias to name the indices. If I
use one EBS volume per index, I have the problem of ensuring that those
volumes are attached to AWS instances when I need to run a query. If not,
then I need to programmatically attach the needed volumes and then start ES
nodes to join a cluster.
I'd like to have ES maintain as much of the state as possible.
Specifically, I'd like to be able to ask ES which EBS volumes need to be
attached to access a given set of AWS indices.
One way to do this is to maintain a separate ES metaindex. However, I know
that ES maintains cluster-wide metadata. My question is, where does this
cluster metadata live? If I maintain a single ES node that does NOT store
index data, will it maintain the cluster metadata? Can I query the cluster
metadata when ALL of the indices that make up the cluster are not only
closed, but actually offline? What do I get back? Do I get back the
mapping meta-data? If so, could I simply add the EBS volume ID to the
index mapping meta-data?
Also, where are index alias definitions stored. May I query an ES non-data
node to get the alias definitions for the associated cluster? If I could
do that, then I could query the cluster, get the current alias definition,
then for each index in the alias, get the required EBS volume and make sure
that it is mounted on some AWS instance. Make sense?
If I let ES manage the index state by using the S3 gateway, then I would
need to know how many AWS instances I need to start up in order to load all
the required indices. This is a simpler problem, and would obviate the
need to maintain a mapping from index name to EBS volume id. However, I've
been warned that the startup time may be significant. Should I simply use
the S3 gateway?
How can I find out which ES clusters are available within a given AWS