Trouble debugging Index Shard Snapshot Failed Exception

Hi,

I am stuck on this. Details follow. Sorry this is so long. Thanks in advance.

IndexShardSnapshotFailedException[Unable to parse Json String.]; nested: AmazonClientException[Unable to parse Json String.]; nested: JsonMappingException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; nested: IllegalArgumentException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; ",

I am hoping someone can help get further with this. I have a plugin that starts with the cloud-aws plugin code and modifies to support KMS server side encryption. I have to do this work on the 2.4 branch.

I am at the point now where if the S3 bucket is empty, the snapshot put works. But once the bucket has contents it fails.

Here is the curl I am executing which works when the bucket is empty

curl 
                -XPUT "localhost:9200/_snapshot/${es_repo}" 
                -H 'Content-Type: application/json' 
                "type": "s3",
            	"settings": {
                "bucket": "${es_repo}",
                "compress": true,
                "protocol": "https",
                "kms_master_key_id":"alias/${es_repo}",
                "server_side_encryption": true
            }
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   252  100   252    0     0  40691      0 --:--:-- --:--:-- --:--:-- 42000
{
  "snapshot" : {
    "snapshot" : "20170612-170348",
    "version_id" : 2040599,
    "version" : "2.4.5",
    "indices" : [ "kimchy", "twitter", "another_user" ],
    "state" : "SUCCESS",
    "start_time" : "2017-06-12T17:03:48.864Z",
    "start_time_in_millis" : 1497287028864,
    "end_time" : "2017-06-12T17:03:51.982Z",
    "end_time_in_millis" : 1497287031982,
    "duration_in_millis" : 3118,
    "failures" : [ ],
    "shards" : {
      "total" : 3,
      "failed" : 0,
      "successful" : 3
    }
  }
}
 but it does not work when the bucket has stuff in it

curl 
                -XPUT "localhost:9200/_snapshot/${es_repo}" 
                -H 'Content-Type: application/json' 
                "type": "s3",
            	"settings": {
                "bucket": "${es_repo}",
                "compress": true,
                "protocol": "https",
                "kms_master_key_id":"alias/${es_repo}",
                "server_side_encryption": true
            }

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   252  100   252    0     0  61643      0 --:--:-- --:--:-- --:--:-- 63000
{
  "snapshot" : {
    "snapshot" : "20170612-170443",
    "version_id" : 2040599,
    "version" : "2.4.5",
    "indices" : [ "kimchy", "twitter", "another_user" ],
    "state" : "PARTIAL",
    "start_time" : "2017-06-12T17:04:43.281Z",
    "start_time_in_millis" : 1497287083281,
    "end_time" : "2017-06-12T17:04:44.805Z",
    "end_time_in_millis" : 1497287084805,
    "duration_in_millis" : 1524,
    "failures" : [ {
      "index" : "another_user",
      "shard_id" : 0,
      "reason" : "IndexShardSnapshotFailedException[Unable to parse Json String.]; nested: AmazonClientException[Unable to parse Json String.]; nested: JsonMappingException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; nested: IllegalArgumentException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; ",
      "node_id" : "NN3iDI3NTcyu3zicB9CNEQ",
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "index" : "kimchy",
      "shard_id" : 0,
      "reason" : "IndexShardSnapshotFailedException[Unable to parse Json String.]; nested: AmazonClientException[Unable to parse Json String.]; nested: JsonMappingException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; nested: IllegalArgumentException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; ",
      "node_id" : "NN3iDI3NTcyu3zicB9CNEQ",
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "index" : "twitter",
      "shard_id" : 0,
      "reason" : "IndexShardSnapshotFailedException[Unable to parse Json String.]; nested: AmazonClientException[Unable to parse Json String.]; nested: JsonMappingException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; nested: IllegalArgumentException[Can not access public java.util.LinkedHashMap() (from class java.util.LinkedHashMap; failed to set access: access denied (\"java.lang.reflect.ReflectPermission\" \"suppressAccessChecks\")]; ",
      "node_id" : "NN3iDI3NTcyu3zicB9CNEQ",
      "status" : "INTERNAL_SERVER_ERROR"
    } ],
    "shards" : {
      "total" : 3,
      "failed" : 3,
      "successful" : 0
    }
  }
}

Really beguiling is that when it does fail with INTERNAL_SERVER_ERROR there is nothing logged to indicate failure or what the INTERNAL_SERVER_ERROR is.

Note, in my own code, I wrap AWS calls with

 AccessController.doPrivileged(new PrivilegedAction() {
                    public Object run() {

which has gotten me further along but I can't apply this technique to the ES code itself.

Thanks in advance.

I don't have access to your code so I can't give more details but you need to make sure that you followed that guide: https://www.elastic.co/guide/en/elasticsearch/plugins/current/plugin-authors.html#plugin-authors-jsm

Don't forget to add the plugin-security.policy file in your classpath.

I think you can print the full SM stack trace using https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/envvars003.html

It might gives you some clues about where this code is called from. In which case you might need to wrap the calling code within a doPrivileged block.

Thanks. This is helpful. I do have plugin-security.policy file (same the cloud-aws) version. Printing the full stack trace will be really helpful. Knowing where else I have to wrap code in a doPrivileged is what I need, I think. Interesting that the cloud-aws code (from which my code is heavily based) doesn't use doPrivileged where I need to in my code.

Thanks again. I'll let you know how it goes.

Here's an update.

I looked for the IndexShardSnapshotFailedException and found it here:

org/elasticsearch/snapshots/SnapshotShardsService.java:315

I wrapped the me method with a AccessController.doPrivileged

    private void snapshot(final IndexShard indexShard, final SnapshotId snapshotId, final IndexShardSnapshotStatus snapshotStatus) {
        System.setProperty("java.security.debug=","access,failure");
        logger.debug("snapshot about to  doPrivileged");

        AccessController.doPrivileged(new PrivilegedAction() {
            public Object run() {
                IndexShardRepository indexShardRepository = snapshotsService.getRepositoriesService().indexShardRepository(snapshotId.getRepository());
                ShardId shardId = indexShard.shardId();
                if (!indexShard.routingEntry().primary()) {
                    throw new IndexShardSnapshotFailedException(shardId, "snapshot should be performed only on primary");
                }
                if (indexShard.routingEntry().relocating()) {
                    // do not snapshot when in the process of relocation of primaries so we won't get conflicts
                    throw new IndexShardSnapshotFailedException(shardId, "cannot snapshot while relocating");
                }
                if (indexShard.state() == IndexShardState.CREATED || indexShard.state() == IndexShardState.RECOVERING) {
                    // shard has just been created, or still recovering
                    throw new IndexShardSnapshotFailedException(shardId, "shard didn't fully recover yet");
                }

                try {
                    // we flush first to make sure we get the latest writes snapshotted
                    SnapshotIndexCommit snapshotIndexCommit = indexShard.snapshotIndex(true);
                    try {
                        indexShardRepository.snapshot(snapshotId, shardId, snapshotIndexCommit, snapshotStatus);
                        if (logger.isDebugEnabled()) {
                            StringBuilder sb = new StringBuilder();
                            sb.append("snapshot (").append(snapshotId.getSnapshot()).append(") completed to ").append(indexShardRepository).append(", took [").append(TimeValue.timeValueMillis(snapshotStatus.time())).append("]\n");
                            sb.append("    index    : version [").append(snapshotStatus.indexVersion()).append("], number_of_files [").append(snapshotStatus.numberOfFiles()).append("] with total_size [").append(new ByteSizeValue(snapshotStatus.totalSize())).append("]\n");
                            logger.debug(sb.toString());
                        }
                    } finally {
                        snapshotIndexCommit.close();
                    }
                } catch (SnapshotFailedEngineException e) {
                    throw e;
                } catch (IndexShardSnapshotFailedException e) {
                    e.printStackTrace();
                    logger.info("IndexShardSnapshotFailedException  " + e.getDetailedMessage());

                    logger.error("IndexShardSnapshotFailedException " + e.getDetailedMessage(), e);
                    throw e;
                } catch (Throwable e) {
                    throw new IndexShardSnapshotFailedException(shardId, "Failed to snapshot", e);
                }
                return "Nothing to return";
            }});
        logger.debug("snapshot done with  doPrivileged");

    }

and now I get this:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   252  100   252    0     0  41049      0 --:--:-- --:--:-- --:--:-- 42000
{
  "snapshot" : {
    "snapshot" : "20170613-201410",
    "version_id" : 2040599,
    "version" : "2.4.5-SNAPSHOT",
    "indices" : [ ],
    "state" : "SUCCESS",
    "start_time" : "2017-06-13T20:14:10.359Z",
    "start_time_in_millis" : 1497384850359,
    "end_time" : "2017-06-13T20:14:10.686Z",
    "end_time_in_millis" : 1497384850686,
    "duration_in_millis" : 327,
    "failures" : [ ],
    "shards" : {
      "total" : 0,
      "failed" : 0,
      "successful" : 0
    }
  }
}

I also get data in my S3 bucket (but I haven't done a restore yet).

I am still left wondering why I have to do this. If this is the only solution I can come up with, it means I'll have to fork the ES 2.4 branch which I really don't want to do.

My code is basically the cloud-aws plugin modified in a few places to support KMS. I have the same policy files etc.

Any ideas appreciated. Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.