ES Backup and Restore

Ariel_Mendoza · February 13, 2012, 1:19pm

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.
Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.
Save the database metadata as a JSON file to the backup location.
Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location
Restart the database to re-enable user access.

How can I achieve #1 and #5?

Berkay_Mollamustafao · February 13, 2012, 1:47pm

I think/hope you should be able to flush the index and make it read only to
accomplish #1. Index would be still accessible for queries but not for
writes.

github.com/elastic/elasticsearch

Set an index / indices to read only, or make the cluster read only

opened 06:27PM - 27 Dec 11 UTC

closed 06:35PM - 27 Dec 11 UTC

kimchy

>feature v0.19.0.RC1

Allow to set a setting `index.blocks.read_only` on an index, and by setting it t…o `true`, will cause the index to become readonly (no index/delete operations allowed). The setting can be updated on an index / indices using the index update settings API. Also, allow to make the cluster read only by setting `cluster.blocks.read_only` to true using the cluster update settings API.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza remote.arielm@gmail.comwrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

kimchy · February 14, 2012, 2:01pm

You don't have to disable user access to elasticsearch, you can simply disable flush using the update settings API, so copying over the shards index data will not happen while a Lucene commit goes on.

You can simply copy over the data location itself, which also includes the metadata of the cluster and indices. Why not just do that? Do you want to make sure you only copy the "primary" shard data?

On Monday, February 13, 2012 at 3:47 PM, Berkay Mollamustafaoglu wrote:

I think/hope you should be able to flush the index and make it read only to accomplish #1. Index would be still accessible for queries but not for writes.

Elasticsearch Platform — Find real-time answers at scale | Elastic
Set an index / indices to read only, or make the cluster read only · Issue #1573 · elastic/elasticsearch · GitHub

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza <remote.arielm@gmail.com (mailto:remote.arielm@gmail.com)> wrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

Ariel_Mendoza · February 16, 2012, 2:02am

Yes, I only want to copy the primary. Thank you for the help!

On Tue, Feb 14, 2012 at 10:01 PM, Shay Banon kimchy@gmail.com wrote:

You don't have to disable user access to elasticsearch, you can simply
disable flush using the update settings API, so copying over the shards
index data will not happen while a Lucene commit goes on.

You can simply copy over the data location itself, which also includes the
metadata of the cluster and indices. Why not just do that? Do you want to
make sure you only copy the "primary" shard data?

On Monday, February 13, 2012 at 3:47 PM, Berkay Mollamustafaoglu wrote:

I think/hope you should be able to flush the index and make it read only
to accomplish #1. Index would be still accessible for queries but not for
writes.

Elasticsearch Platform — Find real-time answers at scale | Elastic
Set an index / indices to read only, or make the cluster read only · Issue #1573 · elastic/elasticsearch · GitHub

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza remote.arielm@gmail.comwrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

kimchy · February 16, 2012, 7:38pm

I would advice to go ahead and backup the data location as is. Backing up just the primaries is difficult to do currently (you can use the cluster state API to figure out where each primary exists and back up only that one, but you also need to make sure you backup the index metadata and global metadata).

On Thursday, February 16, 2012 at 4:02 AM, Ariel wrote:

Yes, I only want to copy the primary. Thank you for the help!

On Tue, Feb 14, 2012 at 10:01 PM, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

You don't have to disable user access to elasticsearch, you can simply disable flush using the update settings API, so copying over the shards index data will not happen while a Lucene commit goes on.

You can simply copy over the data location itself, which also includes the metadata of the cluster and indices. Why not just do that? Do you want to make sure you only copy the "primary" shard data?

On Monday, February 13, 2012 at 3:47 PM, Berkay Mollamustafaoglu wrote:

I think/hope you should be able to flush the index and make it read only to accomplish #1. Index would be still accessible for queries but not for writes.

Elasticsearch Platform — Find real-time answers at scale | Elastic
Set an index / indices to read only, or make the cluster read only · Issue #1573 · elastic/elasticsearch · GitHub

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza <remote.arielm@gmail.com (mailto:remote.arielm@gmail.com)> wrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

Ariel_Mendoza · February 17, 2012, 2:17am

Yes, I have some difficulties, but I'm trying to meet the requirements
and I like the idea of excluding the replicas which may be numerous
and only the primaries in the backup. In the process, I'm also trying
to fill-in the answers on how, why and definitely why not before I can
discuss this and move onto other simpler approach.

Currently, I have the codes below:

    final List<String> indices = Lists.newArrayList(dbName);

    final ClusterStateRequest clusterStateRequest =

Requests.clusterStateRequest()
.filterRoutingTable(false)
.filterNodes(false) // haven't decided yet if true or
false
.filteredIndices(indices.toArray(new String[0]));

    ActionListener al = new ActionListener<ClusterStateResponse>()

{
@Override
public void onResponse(ClusterStateResponse response){
ClusterState state = response.state();
MetaData metaData = state.metaData();

            logger.info(String.format("cluster_name = %s

\n",response.clusterName().value()));

            for (IndexMetaData indexMetaData : metaData) {
                // Skip indices not in indecies list, should only

be one but check just in case
if (indices.isEmpty() || !
indices.contains(indexMetaData.getIndex())) {
continue;
}

                //logger.info(String.format("index = %s\n",

indexMetaData.getIndex()));
//logger.info(String.format("NumberOfReplicas = %d
\n", indexMetaData.getNumberOfReplicas()));
//logger.info(String.format("NumberOfShards = %d
\n", indexMetaData.getNumberOfShards()));

                ImmutableSettings settings = (ImmutableSettings)

indexMetaData.getSettings();

				// Details of Setting
                ImmutableMap<String, String> settingsMap =

settings.getAsMap();
//logger.info(String.format("Settings = %s\n",
settingsMap));

                ImmutableSet<Entry<String, MappingMetaData>>

entries = indexMetaData.getMappings().entrySet();
for (Map.Entry<String,MappingMetaData>
mappingEntry: entries) {
// Details of Mapping
}

                if (!clusterStateRequest.filterRoutingTable()) {
                    logger.info("indices");
                    for (IndexRoutingTable indexRoutingTable :

state.routingTable()) {
logger.info(indexRoutingTable.index());
logger.info("shards");
for (IndexShardRoutingTable
indexShardRoutingTable : indexRoutingTable) {
/*
* Details of location, status(isPrimary) etc, but seems not
enough
*/
}
}
}
}
}

        private String shardDetail(ShardRouting shard)
        {
            StringBuilder sb = new StringBuilder();
                    sb.append("\tstate =

").append(shard.state()).append("\n");
sb.append("\tprimary =
").append(shard.primary()).append("\n");
sb.append("\tnode =
").append( shard.currentNodeId()).append("\n");
sb.append("\trelocating_node =
").append( shard.relocatingNodeId()).append("\n");
sb.append("\tshard =
").append( shard.shardId().id()).append("\n");
sb.append("\tindex =
").append( shard.shardId().index().name()).append("\n");
return sb.toString();
}

        @Override
        public void onFailure(Throwable e) {
            // Handle the error
            e.printStackTrace(System.out);
        }
    };

	try{
	    // Sets the index.translog.disable_flush to true
        updateIndicesFlush(true);
        // Get the clusrter state
        this.getClient()
                .admin()
                .cluster()
                .state(clusterStateRequest, al);
    }finally{
	    // Sets the index.translog.disable_flush back to false
        updateIndicesFlush(false);
    }

Here's the sample folder structure generated:
TestCluster1
- nodes
- 0
+ _state
- indices
- my_db
+ 0 (let have this as primary, just for example)
...
- system
+ 0 (let have this as primary, just for example)
---
+ 1
....

The details I have with regards to the location seems not enough. I
only have the cluster name, the default folders(e.g. nodes and
indices) and below
- indices
- my_db
+ 0
- system
+ 0

I'm trying to figure out from the codes above how to get which folders
in inside the nodes?
- nodes
+ 0
+ 1
...

Also, how do I know if its a global metadata?

Regards,
Ariel

On Feb 17, 3:38 am, Shay Banon kim...@gmail.com wrote:

I would advice to go ahead and backup the data location as is. Backing up just the primaries is difficult to do currently (you can use the cluster state API to figure out where each primary exists and back up only that one, but you also need to make sure you backup the index metadata and global metadata).

On Thursday, February 16, 2012 at 4:02 AM, Ariel wrote:

Yes, I only want to copy the primary. Thank you for the help!

On Tue, Feb 14, 2012 at 10:01 PM, Shay Banon <kim...@gmail.com (mailto:kim...@gmail.com)> wrote:

You don't have to disable user access to elasticsearch, you can simply disable flush using the update settings API, so copying over the shards index data will not happen while a Lucene commit goes on.

You can simply copy over the data location itself, which also includes the metadata of the cluster and indices. Why not just do that? Do you want to make sure you only copy the "primary" shard data?

On Monday, February 13, 2012 at 3:47 PM, Berkay Mollamustafaoglu wrote:

I think/hope you should be able to flush the index and make it read only to accomplish #1. Index would be still accessible for queries but not for writes.

Elasticsearch Platform — Find real-time answers at scale | Elastic....
Set an index / indices to read only, or make the cluster read only · Issue #1573 · elastic/elasticsearch · GitHub

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza <remote.ari...@gmail.com (mailto:remote.ari...@gmail.com)> wrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

kimchy · February 17, 2012, 11:49am

Thats where I was saying that its going to be quite tricky to implement something like that yourself. You can tell which node each shard exists on based on the current node id it has (and then going to the list of nodes and getting its IP address and the like). Global metadata are things like index templates and persistent cluster settings.

On Friday, February 17, 2012 at 4:17 AM, Ariel Mendoza wrote:

Yes, I have some difficulties, but I'm trying to meet the requirements
and I like the idea of excluding the replicas which may be numerous
and only the primaries in the backup. In the process, I'm also trying
to fill-in the answers on how, why and definitely why not before I can
discuss this and move onto other simpler approach.

Currently, I have the codes below:

final List indices = Lists.newArrayList(dbName);

final ClusterStateRequest clusterStateRequest =
Requests.clusterStateRequest()
.filterRoutingTable(false)
.filterNodes(false) // haven't decided yet if true or
false
.filteredIndices(indices.toArray(new String[0]));

ActionListener al = new ActionListener()
{
@Override
public void onResponse(ClusterStateResponse response){
ClusterState state = response.state();
MetaData metaData = state.metaData();

logger.info (http://logger.info)(String.format("cluster_name = %s
\n",response.clusterName().value()));

for (IndexMetaData indexMetaData : metaData) {
// Skip indices not in indecies list, should only
be one but check just in case
if (indices.isEmpty() || !
indices.contains(indexMetaData.getIndex())) {
continue;
}

//logger.info (http://logger.info)(String.format("index = %s\n",
indexMetaData.getIndex()));
//logger.info (http://logger.info)(String.format("NumberOfReplicas = %d
\n", indexMetaData.getNumberOfReplicas()));
//logger.info (http://logger.info)(String.format("NumberOfShards = %d
\n", indexMetaData.getNumberOfShards()));

ImmutableSettings settings = (ImmutableSettings)
indexMetaData.getSettings();

// Details of Setting
ImmutableMap<String, String> settingsMap =
settings.getAsMap();
//logger.info (http://logger.info)(String.format("Settings = %s\n",
settingsMap));

ImmutableSet<Entry<String, MappingMetaData>>
entries = indexMetaData.getMappings().entrySet();
for (Map.Entry<String,MappingMetaData>
mappingEntry: entries) {
// Details of Mapping
}

if (!clusterStateRequest.filterRoutingTable()) {
logger.info (http://logger.info)("indices");
for (IndexRoutingTable indexRoutingTable :
state.routingTable()) {
logger.info (http://logger.info)(indexRoutingTable.index());
logger.info (http://logger.info)("shards");
for (IndexShardRoutingTable
indexShardRoutingTable : indexRoutingTable) {
/*

Details of location, status(isPrimary) etc, but seems not
enough
*/
}
}
}
}
}

private String shardDetail(ShardRouting shard)
{
StringBuilder sb = new StringBuilder();
sb.append("\tstate =
").append(shard.state()).append("\n");
sb.append("\tprimary =
").append(shard.primary()).append("\n");
sb.append("\tnode =
").append( shard.currentNodeId()).append("\n");
sb.append("\trelocating_node =
").append( shard.relocatingNodeId()).append("\n");
sb.append("\tshard =
").append( shard.shardId().id()).append("\n");
sb.append("\tindex =
").append( shard.shardId().index().name()).append("\n");
return sb.toString();
}

@Override
public void onFailure(Throwable e) {
// Handle the error
e.printStackTrace(System.out);
}
};

try{
// Sets the index.translog.disable_flush to true
updateIndicesFlush(true);
// Get the clusrter state
this.getClient()
.admin()
.cluster()
.state(clusterStateRequest, al);
}finally{
// Sets the index.translog.disable_flush back to false
updateIndicesFlush(false);
}

Here's the sample folder structure generated:
TestCluster1

nodes

0

_state

indices

my_db

0 (let have this as primary, just for example)
...

system

0 (let have this as primary, just for example)

1
....

The details I have with regards to the location seems not enough. I
only have the cluster name, the default folders(e.g. nodes and
indices) and below

indices

my_db

0

system

0

I'm trying to figure out from the codes above how to get which folders
in inside the nodes?

nodes

0

1
...

Also, how do I know if its a global metadata?

Regards,
Ariel

On Feb 17, 3:38 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

I would advice to go ahead and backup the data location as is. Backing up just the primaries is difficult to do currently (you can use the cluster state API to figure out where each primary exists and back up only that one, but you also need to make sure you backup the index metadata and global metadata).

On Thursday, February 16, 2012 at 4:02 AM, Ariel wrote:

Yes, I only want to copy the primary. Thank you for the help!

On Tue, Feb 14, 2012 at 10:01 PM, Shay Banon <kim...@gmail.com (mailto:kim...@gmail.com (http://gmail.com))> wrote:

You don't have to disable user access to elasticsearch, you can simply disable flush using the update settings API, so copying over the shards index data will not happen while a Lucene commit goes on.

You can simply copy over the data location itself, which also includes the metadata of the cluster and indices. Why not just do that? Do you want to make sure you only copy the "primary" shard data?

On Monday, February 13, 2012 at 3:47 PM, Berkay Mollamustafaoglu wrote:

I think/hope you should be able to flush the index and make it read only to accomplish #1. Index would be still accessible for queries but not for writes.

Elasticsearch Platform — Find real-time answers at scale | Elastic....
Set an index / indices to read only, or make the cluster read only · Issue #1573 · elastic/elasticsearch · GitHub

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Mon, Feb 13, 2012 at 8:19 AM, Ariel Mendoza <remote.ari...@gmail.com (mailto:remote.ari...@gmail.com (http://gmail.com))> wrote:

I'm currently working on ES BackUp and Restore using ES Java API.

I have the following process to implement this, but there are still
something missing in the puzzle.

Stop the database index to disable user access.

Query the ES cluster status to get the database (ES index)
configuration which defines number of shards,
location of the shards and replicas and database mappings.

Save the database metadata as a JSON file to the backup location.

Iterate all servers containing the primary shards and copy all
primary shard lucene index files to the backup location

Restart the database to re-enable user access.

How can I achieve #1 and #5?

Topic		Replies	Views
Way to make a shard completely read-only? Elasticsearch	5	1255	July 5, 2017
Storing new elasticsearch documents while restoring an index Elasticsearch	3	453	November 13, 2018
Can I setup an index as "read-only" in just a few nodes, and expect ES to route correctly? Elasticsearch	2	1172	July 6, 2017
Read only on index prevents getting status Elasticsearch	4	1951	July 6, 2017
Best way to offload indexing from reading node Elasticsearch	5	1221	July 6, 2017

ES Backup and Restore

Related topics