What are the policies surrounding local storage management when using
an S3 gateway? I notice that local store IS used (temporarily?) when
S3 is using as a storage gateway. When does that data get deleted?
Does the ES node using that local storage watch to see if the local
disk space is filling up? Does it discard the data when the index
associated with that data is closed? How can I use one AWS instance
to index data (sequentially) into N different ES indices, where N is
unbounded? Do I have to clear the local store on close? Should I use
an in-memory index for this purpose?
I am indexing a large data set that grows by over 10GB (index space)
per day. I need to be able to search the last N (N>100) days of data,
but I need to maintain the last N +M (M>500) days of data for archival
My strategy is to put each day into a separate ES index. When I need
that index, I fire up a AWS EC2 instance, launch an ES node on that
instance, and open the index.
This works well if my EC2 instance uses a local ES gateway and the
data for the index is already stored in the local store.
Instead, I'd like to use an S3 gateway so that I don't have to worry
In the S3 gateway scenario, each ES node is assigned (shards of) a
newly opened index and presumably the shard data is loaded from the S3
However, the data is also stored locally on disk during the indexing
process. So, I must provision local disk space even though an S3
gateway is used for the backing store. This is fine if I can bound
the size of the local space needed, but if I am simply using the ES
node to index data into a number of different ES indices, then I want
the local space associated with an ES index to be freed when the ES
index is closed.