Gateways, index data encryption and index backup/restore


(Mauri) #1

Q1. Clarification of local vs shared gateway.
I understand that the shared gateways are used to save a copy of all
cluster metadata and index data to a common location.
During a full cluster restart the nodes load cluster metadata from the
gateway.
After the cluster metadata has been loaded the nodes that host primary
index shards load shard data files that do not already exist in the
nodes local file system from the gateway and then start the primary
shard lucene indexes.
After the primary shards have been established the replica shards are
initialise from the nodes hosting the primary shards.

I am unclear what the local gateways do when a cluster is recovered
using local gateways.
In this situation the nodes can recover cluster metadata from their
local disc and start their primary shard lucene indexes because the
index data is already on local disc.
Replicas can then be initialised from primaries as for the shared
gateways.
What am I missing with regards to the local gateway?
Does the local gateway do some sort of consolidation of cluster
metadata sourced from all active nodes?
Does the local gateway do some checking or cleanup of primary shard
data on local disc before the lucene indexes are started?

Q2. Some applications require data to be encrypted "at rest". This can
be achieved for the lucene files by saving them on encrypted disc
volumes using tools like truecrypt. I believe this can also be used
with AWS EBS volumes.
With regards to the shared gateway options (S3, Hadoop and shared FS)
is there an option to encrypt cluster metadata and index data before
it is saved so that it is encrypted "at rest"?

Q3. I have noticed that many people have asked about, and even
attempted, backup/restore of individual indexes. Are there any
concrete plans for providing this as a built in ES capability?
If so what version may we see this in?
Will this have encryption capability?

Regards
Mauri


(Shay Banon) #2

On Thursday, February 23, 2012 at 2:40 AM, Mauri wrote:

Q1. Clarification of local vs shared gateway.
I understand that the shared gateways are used to save a copy of all
cluster metadata and index data to a common location.
During a full cluster restart the nodes load cluster metadata from the
gateway.
After the cluster metadata has been loaded the nodes that host primary
index shards load shard data files that do not already exist in the
nodes local file system from the gateway and then start the primary
shard lucene indexes.
After the primary shards have been established the replica shards are
initialise from the nodes hosting the primary shards.

I am unclear what the local gateways do when a cluster is recovered
using local gateways.
In this situation the nodes can recover cluster metadata from their
local disc and start their primary shard lucene indexes because the
index data is already on local disc.
Replicas can then be initialised from primaries as for the shared
gateways.
What am I missing with regards to the local gateway?
Does the local gateway do some sort of consolidation of cluster
metadata sourced from all active nodes?
Does the local gateway do some checking or cleanup of primary shard
data on local disc before the lucene indexes are started?

Yes, the local gateway knows goes and picks the latest version metadata out of the nodes running and uses it. It then goes and finds the latest shards version and uses them to initialize.

Q2. Some applications require data to be encrypted "at rest". This can
be achieved for the lucene files by saving them on encrypted disc
volumes using tools like truecrypt. I believe this can also be used
with AWS EBS volumes.
With regards to the shared gateway options (S3, Hadoop and shared FS)
is there an option to encrypt cluster metadata and index data before
it is saved so that it is encrypted "at rest"?

No, there is no such option.

Q3. I have noticed that many people have asked about, and even
attempted, backup/restore of individual indexes. Are there any
concrete plans for providing this as a built in ES capability?
If so what version may we see this in?
Will this have encryption capability?

Yes, there is a plan to add support for such APIs in elasticsearch itself. In 0.19 we made progress by how we store local gateway state (its encapsulated in each index and shard).

Regards
Mauri


(system) #3