Index restore in pending task

Srivastavark · March 17, 2017, 1:45pm

Hello experts,

I have tried to restore a snapshot (ES version 1.4.4 ) size 488 GB in ES 2.4.0. (5 node cluster 30 GB Heap)
But for last 3 days its in queue with no progress.
,,,
{"tasks":[{"insert_order":55678,"priority":"NORMAL","source":"restore_snapshot[mybackup]","executing":true,"time_in_queue_millis":262369377,"time_in_queue":"3d"}]}
,,,

Can anyone help here.

nik9000 · March 17, 2017, 2:04pm

Yikes!

I've seen similar things when the index had allocation filtering on it. Though I don't think the behavior is exactly the same. Either way, it is worth checking first.

Can you post a snapshot of the running threads taken with jstack?

Srivastavark · March 17, 2017, 2:45pm

Hi Nik,

its very long , PFB some line. Pleas Suggest , if you think any alternate way to migrate data.

"elasticsearch[node-name-node-name][clusterService#updateTask][T#1]" #53 daemon prio=5 os_prio=0 tid=0x00007f85b51e6800 nid=0x979 runnable [0x00007f7d20f6d000]
java.lang.Thread.State: RUNNABLE
at java.lang.Object.(Object.java:37)
at java.util.zip.Deflater.(Deflater.java:168)
at org.elasticsearch.common.compress.deflate.DeflateCompressor.streamOutput(DeflateCompressor.java:126)
at org.elasticsearch.common.compress.CompressedXContent.(CompressedXContent.java:83)
at org.elasticsearch.index.mapper.DocumentMapper.(DocumentMapper.java:209)
at org.elasticsearch.index.mapper.DocumentMapper.updateFieldType(DocumentMapper.java:385)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:413)
- locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
- locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.checkMappingsCompatibility(MetaDataIndexUpgradeService.java:302)
at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.upgradeIndexMetaData(MetaDataIndexUpgradeService.java:116)
at org.elasticsearch.snapshots.RestoreService$1.execute(RestoreService.java:228)
at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x00007f7dea543e40> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"elasticsearch[node-name-node-name][[http_server_boss.default]][T#1]{New I/O server boss #34}" #52 daemon prio=5 os_prio=0 tid=0x00007f85b5169000 nid=0x978 runnable [0x00007f7d2106e000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00007f7de9ef6b28> (a sun.nio.ch.Util$2)
- locked <0x00007f7de9ef6b10> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007f7dea301bb0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at org.jboss.netty.channel.socket.nio.NioServerBoss.select(NioServerBoss.java:163)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:212)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x00007f7dea531250> (a java.util.concurrent.ThreadPoolExecutor$Worker)

nik9000 · March 17, 2017, 3:10pm

https://gist.github.com/

Srivastavark · March 17, 2017, 3:59pm

gist.github.com

https://gist.github.com/searchandanalytics/87a0f3de9a6d96623002b6937978e420

master_node_jstack


ubuntu@node-name-node-name:~$ jstack -l 123456


"Attach Listener" #6309 daemon prio=9 os_prio=0 tid=0x00007f7d34001000 nid=0x54d1 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

This file has been truncated. show original

nik9000 · March 17, 2017, 4:04pm

This is the important one:

"elasticsearch[node-name-node-name][clusterService#updateTask][T#1]" #53 daemon prio=5 os_prio=0 tid=0x00007f85b51e6800 nid=0x979 runnable [0x00007f7d20f6d000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Object.<init>(Object.java:37)
        at java.util.zip.Deflater.<init>(Deflater.java:168)
        at org.elasticsearch.common.compress.deflate.DeflateCompressor.streamOutput(DeflateCompressor.java:126)
        at org.elasticsearch.common.compress.CompressedXContent.<init>(CompressedXContent.java:83)
        at org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:209)
        at org.elasticsearch.index.mapper.DocumentMapper.updateFieldType(DocumentMapper.java:385)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:413)
        - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
        - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
        at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.checkMappingsCompatibility(MetaDataIndexUpgradeService.java:302)
        at org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.upgradeIndexMetaData(MetaDataIndexUpgradeService.java:116)
        at org.elasticsearch.snapshots.RestoreService$1.execute(RestoreService.java:228)
        at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
        at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

It looks like it is attempting to do the restore the index. I don't know this code super well but unless you told me it wasn't working I'd assume from the stack trace that it was. Does that thread change as time goes on?

I'm not really sure what is up, frankly. Are you having GCs or something?

Srivastavark · March 17, 2017, 4:28pm

Yes , i can see the ip:9200/_nodes/hot_threads is changing as time goes on. But last time it took 9 days and nothing happened. This time i have enabled shard recovery debug and started restore again but nothing in the log, keep going on.

For rolling upgrade , I have tried to copy the index folder from 1.4.4 /etc/es/clustername/nodes/0/indices/ and paste in target 2.4.0 location with different name and restarted the node but only process start. Node did' join the cluster but i can see same jstack log here also.

I have also tried to re-index using logstash's elasticsearch input and output plugin. It work fine for all other indexes but this index (488 GB) taking to much time.

As of now no GC, let me enable and collect the log. Please suggest me , if you have any better tested approach to migrate this much amount of data.

Igor_Motov · March 17, 2017, 7:03pm

@Srivastavark where are these nodes running? Locally or in AWS? Which repository are you using? Did you change any throttling settings? Do you see any progress while using recovery monitoring tool such as _cat/recovery? When you run jstack several times does the stacktrace that @nik9000 posted disappear or it stays there?

Srivastavark · March 18, 2017, 1:36am

Hi Igor,

Nodes are running on AWS, we have taken the backup from 1.4.4 to S3 and again copied the backup on all nodes in local fs in 2.4.0( running on AWS) and restoring from fs.
PFB the settings.

{"persistent":{"threadpool":{"bulk":{"size":"16"},"index":{"size":"32"}},"indices":{"recovery":{"concurrent_streams":"5","max_bytes_per_sec":"500mb"}}},"transient":{"cluster":{"routing":
{"allocation":{"enable":"all"}}}}}

Nothing in _cat/recovery for the index. But _nodes/hot_threads changing on each hit.

Cluster Details
3 Master-data, 2 data
All master-data and data node and 64GB RAM and 1.29 TB space . one Master-data node have 32 GB and 660 GB space.
Heap - not more than half of total available memory.

Yml config - all default except below

cluster.name: myname
node.name: prodnode1
node.master: true
node.data: true
path.data: /data/elasticsearch
path.logs: /var/log/elasticsearch
path.repo: /data/datas3/elastic-search-backup
bootstrap.memory_lock: true
network.host: hostIP
discovery.zen.ping.unicast.hosts: ["master-data", "master-data", "master-data"]
discovery.zen.minimum_master_nodes: 2

Igor_Motov · March 18, 2017, 1:40am

Could you post a few examples of hot threads? It's important to understand how they are changing.

Srivastavark · March 18, 2017, 1:42am

Will add jstack.

Igor_Motov · March 18, 2017, 1:47am

It seems to be stuck in the following place on the master node:

       ...
       org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:381)
       org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
       org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.checkMappingsCompatibility(MetaDataIndexUpgradeService.java:302)
       org.elasticsearch.cluster.metadata.MetaDataIndexUpgradeService.upgradeIndexMetaData(MetaDataIndexUpgradeService.java:116)
       org.elasticsearch.snapshots.RestoreService$1.execute(RestoreService.java:228)
       org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
       org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468)
       org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
       org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
       org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       java.lang.Thread.run(Thread.java:745)

Could you run hot threads a few more times to see if it will get out of this loop? How many indices are you trying to restore from this snapshot?

Srivastavark · March 18, 2017, 2:00am

Only one index snapshot size -248 GB .

On can these two threads on each nodes/hot_threads request.

org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:413)
org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)

We have more than 150000 document type in our mapping.

gist.github.com

https://gist.github.com/searchandanalytics/57df5bff461fa48763833da4393dd650

master_node_jstack_fresh




ubuntu@node-node0001:~$ jstack -l 2363


"elasticsearch[node-node0001][generic][T#1482]" #6316 daemon prio=5 os_prio=0 tid=0x00007f7ca4111000 nid=0x54dd waiting on condition [0x00007f7c7661c000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007f7de99bb750> (a java.util.concurrent.SynchronousQueue$TransferStack)

This file has been truncated. show original

Srivastavark · March 18, 2017, 2:19am

Jstack 1st hit

   at org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:209)
        at org.elasticsearch.index.mapper.DocumentMapper.updateFieldType(DocumentMapper.java:385)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:413)
        - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
        - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)

jstack 2nd hit

 at org.elasticsearch.index.mapper.object.ObjectMapper.toXContent(ObjectMapper.java:609)
    at org.elasticsearch.index.mapper.Mapping.toXContent(Mapping.java:141)
    at org.elasticsearch.index.mapper.DocumentMapper.toXContent(DocumentMapper.java:394)
    at org.elasticsearch.common.compress.CompressedXContent.<init>(CompressedXContent.java:88)
    at org.elasticsearch.index.mapper.DocumentMapper.<init>(DocumentMapper.java:209)
    at org.elasticsearch.index.mapper.DocumentMapper.updateFieldType(DocumentMapper.java:385)
    at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:413)
    - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)
    at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
    - locked <0x00007f7e313703d0> (a org.elasticsearch.index.mapper.MapperService)

Igor_Motov · March 18, 2017, 2:30am

Judging from the stack traces that you provided it looks like the problem is in the MetaDataIndexUpgradeService that is trying to upgrade your index mapping from 1.4 to 2.4 and either goes into an infinite loop or just takes very very long time to do that for some reason. I briefly reviewed the code and I don't see any places in the code that could have caused an infinite loop. Yet something in the mapping of you index is causing this issue.

I couldn't find any related issues, but it might be worth trying to restore in 2.4.4 to see if this issue was fixed there. If that also fails I can try reproducing the issue locally, but for that I would need the following 3 files from your repository:

ROOT
|- snapshot-NAME_OF_THE_SNAPSHOT
|- meta-NAME_OF_THE_SNAPSHOT
|- indices/
   | - INDEX_NAME
       |- meta-NAME_OF_THE_SNAPSHOT

Where NAME_OF_THE_SNAPSHOT is the name of the snapshot that you are trying to restore and INDEX_NAME is the name of the index in this snapshot. These files contain your cluster persistent settings and templates and the index mappings and settings. If you can share this information, please send it to me via a private message or email (my email is my first name at elastic.co) .

Srivastavark · March 18, 2017, 10:58am

Many thanks Igor, We have 2.4.0 in production but l will try to validate in 2.4.4 as well and will update here. I have easily migrated all other indexes using logstash and snapshot/restore.

system · April 15, 2017, 10:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index restoration from snapshot is taking a lot of time in 7.2.0 ES Elasticsearch	20	3149	June 5, 2020
Snapshot restore hanging on 6.4 Elasticsearch	7	1004	November 24, 2018
Snapshot Restore - Index green & present with data, but missing data Elasticsearch snapshot-and-restore	1	389	October 20, 2021
Snapshot process stuck in one last shard Elasticsearch	5	961	July 5, 2017
Failed Snapshot with Pending Tasks Elasticsearch	1	814	July 5, 2017

Index restore in pending task

Related topics