Where is the data stored? Is it in local file system / HDFS? Is it
persisted? What is the default configuration of ES-yarn version? In the
standalone version without using Yarn, you could configure all of this in
the config file.
Where is the data stored? Is it in local file system / HDFS? Is it persisted? What is the default configuration of
ES-yarn version? In the standalone version without using Yarn, you could configure all of this in the config file.
does
not really describe how to configure the different options, just that
they are available. From the page:
Each container can currently access its local storage - with proper
configuration this can be kept outside the disposable container folder thus
allowing the data to live between restarts. This is the recommended
approach as it offers the best performance and due to Elasticsearch itself,
redundancy as well (through replicas).
But below it says:
If no storage is configured, out of the box Elasticsearch will use its
container storage which means when the container is disposed, so is its
data. In other words, between restarts any existing data is destroyed.
What is not described is how to configure storage, especially for the
"recommended approach" where data would live between restarts
So how would one configure elasticsearch-yarn for the recommended approach?
Does one make changes in the elasticsearch.zip's config files? If so, what
settings?
Installing a plugin or changing a configuration means restarting each node and since YARN is not persistent, one would
have to handle this outside.
es-yarn could potentially address that however at that point, it becomes more a puppet/chef feature which is outside the
scope of the project.
The simplest solution would be to simply modify the elasticsearch.zip that you are using as that one would be installed
on each node - whether it's a configuration
or installing a plugin, as long as its part of the zip, it will be distributed across each node.
Each container can currently access its local storage - with proper configuration this can be kept outside the
disposable container folder thus allowing the data to live between restarts. This is the recommended approach as it
offers the best performance and due to Elasticsearch itself, redundancy as well (through replicas).
But below it says:
If no storage is configured, out of the box Elasticsearch will use its container storage which means when the
container is disposed, so is its data. In other words, between restarts any existing data is destroyed.
What is not described is how to configure storage, especially for the "recommended approach" where data would live
between restarts
So how would one configure elasticsearch-yarn for the recommended approach? Does one make changes in the
elasticsearch.zip's config files? If so, what settings?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.