I have a clean install of 6.4.2 (OSS), I have create a new index (single shard), load data into it (~800mb), then force a merge to optimize and finally snapshot to S3. When I try to restore the index into a different cluster (same ES version 6.4.2 OSS) it just hangs indefinitely, no information or errors in the logs. As an experiment I tried to delete the index on the original cluster and restore from S3 and exactly the same behaviour, it just hangs and never restores.
With nothing to go on in the logs and after several hours trawling the forums I am completely stumped. We have been using Elasticsearch in production for years and rely heavily in S3 snapshot/restores and I have never come across this before. Any advice would be hugely appreciated.
ES Version: 6.4.2
AWS Instance: i3.xlarge
Storage: NVMe (Instance)
JVM: OpenJDK 1.8.0_181
/data/mycompany-etl.log
[2018-10-26T05:11:35,076][INFO ][o.e.n.Node ] [mycompany-etl] initializing ...
[2018-10-26T05:11:35,137][INFO ][o.e.e.NodeEnvironment ] [mycompany-etl] using [1] data paths, mounts [[/data (/dev/nvme0n1)]], net usable_space [825.5gb], net total_space [869.8gb], types [ext4]
[2018-10-26T05:11:35,137][INFO ][o.e.e.NodeEnvironment ] [mycompany-etl] heap size [14.9gb], compressed ordinary object pointers [true]
[2018-10-26T05:11:35,138][INFO ][o.e.n.Node ] [mycompany-etl] node name [mycompany-etl], node ID [Qt2wq6C2TXalnocWgfp2Uw]
[2018-10-26T05:11:35,138][INFO ][o.e.n.Node ] [mycompany-etl] version[6.4.2], pid[2593], build[oss/deb/04711c2/2018-09-26T13:34:09.098244Z], OS[Linux/4.15.0-1023-aws/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_181/25.181-b13]
[2018-10-26T05:11:35,138][INFO ][o.e.n.Node ] [mycompany-etl] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.VV6nHeo5, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:/var/log/elasticsearch/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Xms15g, -Xmx15g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=oss, -Des.distribution.type=deb]
[2018-10-26T05:11:36,391][INFO ][o.e.p.PluginsService ] [mycompany-etl] loaded plugin [repository-s3]
[2018-10-26T05:11:40,100][INFO ][o.e.d.DiscoveryModule ] [mycompany-etl] using discovery type [zen]
[2018-10-26T05:11:40,601][INFO ][o.e.n.Node ] [mycompany-etl] initialized
[2018-10-26T05:11:40,601][INFO ][o.e.n.Node ] [mycompany-etl] starting ...
[2018-10-26T05:11:40,852][INFO ][o.e.t.TransportService ] [mycompany-etl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-10-26T05:11:43,914][INFO ][o.e.c.s.MasterService ] [mycompany-etl] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {mycompany-etl}{Qt2wq6C2TXalnocWgfp2Uw}{JsiT2G5-TTGoqUQdEC7SlA}{127.0.0.1}{127.0.0.1:9300}
[2018-10-26T05:11:43,919][INFO ][o.e.c.s.ClusterApplierService] [mycompany-etl] new_master {mycompany-etl}{Qt2wq6C2TXalnocWgfp2Uw}{JsiT2G5-TTGoqUQdEC7SlA}{127.0.0.1}{127.0.0.1:9300}, reason: apply cluster state (from master [master {mycompany-etl}{Qt2wq6C2TXalnocWgfp2Uw}{JsiT2G5-TTGoqUQdEC7SlA}{127.0.0.1}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]])
[2018-10-26T05:11:43,959][INFO ][o.e.h.n.Netty4HttpServerTransport] [mycompany-etl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-10-26T05:11:43,959][INFO ][o.e.n.Node ] [mycompany-etl] started
[2018-10-26T05:11:43,973][INFO ][o.e.g.GatewayService ] [mycompany-etl] recovered [0] indices into cluster_state
[2018-10-26T05:13:02,218][INFO ][o.e.c.m.MetaDataCreateIndexService] [mycompany-etl] [mat_abpris_20181026] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2018-10-26T05:13:02,610][INFO ][o.e.c.m.MetaDataMappingService] [mycompany-etl] [mat_abpris_20181026/GsjlWFkuT3yrCD2QwEcn7w] create_mapping [_doc]
[2018-10-26T05:13:02,624][INFO ][o.e.c.m.MetaDataMappingService] [mycompany-etl] [mat_abpris_20181026/GsjlWFkuT3yrCD2QwEcn7w] update_mapping [_doc]
[2018-10-26T05:17:08,191][INFO ][o.e.c.m.MetaDataDeleteIndexService] [mycompany-etl] [mat_abpris_20181026/GsjlWFkuT3yrCD2QwEcn7w] deleting index
[2018-10-26T05:24:35,337][INFO ][o.e.c.m.MetaDataUpdateSettingsService] [mycompany-etl] updating number_of_replicas to [0] for indices [mat_abpris_20181026]
[2018-10-26T05:24:35,357][INFO ][o.e.c.s.IndexScopedSettings] [mycompany-etl] updating [index.refresh_interval] from [-1] to [60s]
[2018-10-26T05:25:15,928][INFO ][o.e.s.SnapshotsService ] [mycompany-etl] snapshot [snapshot_mycompany:mat_abpris_20181026/unJsQwyMQeyYlyKZVm6Hrw] started
[2018-10-26T05:25:38,072][INFO ][o.e.s.SnapshotsService ] [mycompany-etl] snapshot [snapshot_mycompany:mat_abpris_20181026/unJsQwyMQeyYlyKZVm6Hrw] completed with state [SUCCESS]
GET /_cluster/allocation/explain
{
"index": "mat_abpris_20181026",
"shard": 0,
"primary": true,
"current_state": "initializing",
"unassigned_info": {
"reason": "NEW_INDEX_RESTORED",
"at": "2018-10-26T05:42:14.615Z",
"details": "restore_source[snapshot_mycompany/mat_abpris_20181026]",
"last_allocation_status": "awaiting_info"
},
"current_node": {
"id": "Qt2wq6C2TXalnocWgfp2Uw",
"name": "mycompany-etl",
"transport_address": "127.0.0.1:9300"
},
"explanation": "the shard is in the process of initializing on node [mycompany-etl], wait until initialization has completed"
}
/_cat/recovery?v
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
mat_abpris_20181026 0 3.5m snapshot index n/a n/a 127.0.0.1 mycompany-etl snapshot_mycompany mat_abpris_20181026 15 9 60.0% 15 645607266 116736429 18.1% 645607266 0 0 100.0%