Can't move ES indexes to another disk

sourcedump · November 20, 2020, 5:30pm

Hello everyone!
I have an ES 6.1.3 installation that works perfectly, but now the disk where the indexes are stored (we can call it 'Disk A' with path '/data-ephemeral') is full and we can't make it larger.
So I've decided to make another slower disk ('Disk B' with path '/data-persistent') and move all the indexes of the 'Disk A' on it to make space available on that disk for the new indexes.

To make a test, as a first step, I've copied the indexes and tested ES. The operations I've made are:

stopped the ES service;
run an 'rsync' (rsync -av ) between the two disks;
changed the 'path.data' parameter in the 'elasticsearch.yml' configuration file to point the new disk;
started the ES service.

At this point I've noticed that ES was down and the logs was full of errors. Here an extract:

[2020-10-28T08:42:49,358][INFO ][o.e.n.Node               ] [monitoring-1] initializing ...
[2020-10-28T08:42:49,546][INFO ][o.e.e.NodeEnvironment    ] [monitoring-1] using [1] data paths, mounts [[/data-persistent (/dev/xvdf1)]], net usable_space [94.5gb], net total_space [492gb], types [ext4]
[2020-10-28T08:42:49,547][INFO ][o.e.e.NodeEnvironment    ] [monitoring-1] heap size [11.9gb], compressed ordinary object pointers [true]
[2020-10-28T08:43:21,649][INFO ][o.e.n.Node               ] [monitoring-1] node name [monitoring-1], node ID [CAeJ1WrERdaz56Z45ceZ-Q]
[2020-10-28T08:43:21,650][INFO ][o.e.n.Node               ] [monitoring-1] version[6.1.3], pid[14856], build[af51318/2018-01-26T18:22:55.523Z], OS[Linux/4.4.0-128-generic/amd64], JVM[Private Build/OpenJDK 64-Bit Server VM/1.8.0_265/25.265-b01]
[2020-10-28T08:43:21,650][INFO ][o.e.n.Node               ] [monitoring-1] JVM arguments [-Xms12g, -Xmx12g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch]
[2020-10-28T08:43:23,880][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [aggs-matrix-stats]
[2020-10-28T08:43:23,881][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [analysis-common]
[2020-10-28T08:43:23,881][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [ingest-common]
[2020-10-28T08:43:23,881][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [lang-expression]
[2020-10-28T08:43:23,882][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [lang-mustache]
[2020-10-28T08:43:23,882][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [lang-painless]
[2020-10-28T08:43:23,882][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [mapper-extras]
[2020-10-28T08:43:23,882][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [parent-join]
[2020-10-28T08:43:23,882][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [percolator]
[2020-10-28T08:43:23,883][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [reindex]
[2020-10-28T08:43:23,883][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [repository-url]
[2020-10-28T08:43:23,883][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [transport-netty4]
[2020-10-28T08:43:23,883][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded module [tribe]
[2020-10-28T08:43:23,885][INFO ][o.e.p.PluginsService     ] [monitoring-1] loaded plugin [repository-s3]
[2020-10-28T08:44:18,144][INFO ][o.e.d.DiscoveryModule    ] [monitoring-1] using discovery type [zen]
[2020-10-28T08:44:18,769][INFO ][o.e.n.Node               ] [monitoring-1] initialized
[2020-10-28T08:44:18,769][INFO ][o.e.n.Node               ] [monitoring-1] starting ...
[2020-10-28T08:44:18,979][INFO ][o.e.t.TransportService   ] [monitoring-1] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2020-10-28T08:44:23,786][INFO ][o.e.c.s.MasterService    ] [monitoring-1] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {monitoring-1}{CAeJ1WrERdaz56Z45ceZ-Q}{S1qNbfDPTYC8BH51uVF6vg}{localhost}{127.0.0.1:9300}
[2020-10-28T08:44:23,791][INFO ][o.e.c.s.ClusterApplierService] [monitoring-1] new_master {monitoring-1}{CAeJ1WrERdaz56Z45ceZ-Q}{S1qNbfDPTYC8BH51uVF6vg}{localhost}{127.0.0.1:9300}, reason: apply cluster state (from master [master {monitoring-1}{CAeJ1WrERdaz56Z45ceZ-Q}{S1qNbfDPTYC8BH51uVF6vg}{localhost}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2020-10-28T08:44:23,826][INFO ][o.e.h.n.Netty4HttpServerTransport] [monitoring-1] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2020-10-28T08:44:23,826][INFO ][o.e.n.Node               ] [monitoring-1] started
[2020-10-28T08:44:29,780][INFO ][o.e.m.j.JvmGcMonitorService] [monitoring-1] [gc][11] overhead, spent [271ms] collecting in the last [1s]
[2020-10-28T08:44:30,781][INFO ][o.e.m.j.JvmGcMonitorService] [monitoring-1] [gc][12] overhead, spent [301ms] collecting in the last [1s]
[2020-10-28T08:44:32,788][INFO ][o.e.m.j.JvmGcMonitorService] [monitoring-1] [gc][14] overhead, spent [279ms] collecting in the last [1s]
[2020-10-28T08:45:54,488][WARN ][r.suppressed             ] path: /_cat/indices, params: {v=}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
        at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:165) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:68) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:45) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.<init>(TransportBroadcastByNodeAction.java:256) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:234) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:79) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.1.3.jar:6.1.3]

{ ... }

        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
[2020-10-28T08:49:35,734][INFO ][o.e.g.GatewayService     ] [monitoring-1] recovered [2790] indices into cluster_state
[2020-10-28T08:49:35,734][WARN ][o.e.c.s.ClusterApplierService] [monitoring-1] cluster state applier task [apply cluster state (from master [master {monitoring-1}{CAeJ1WrERdaz56Z45ceZ-Q}{S1qNbfDPTYC8BH51uVF6vg}{localhost}{127.0.0.1:9300} committed version [2] source [local-gateway-elected-state]])] took [5m] above the warn threshold of 30s
[2020-10-28T08:49:35,734][WARN ][o.e.c.s.MasterService    ] [monitoring-1] cluster state update task [local-gateway-elected-state] took [5m] above the warn threshold of 30s
[2020-10-28T08:49:57,584][INFO ][o.e.n.Node               ] [monitoring-1] stopping ...
[2020-10-28T08:49:58,010][WARN ][o.e.g.GatewayAllocator$InternalPrimaryShardAllocator] [monitoring-1] [s3api-2019.07.10][0]: failed to list shard for shard_started on node [CAeJ1WrERdaz56Z45ceZ-Q]
org.elasticsearch.action.FailedNodeException: Failed node [CAeJ1WrERdaz56Z45ceZ-Q]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:239) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:153) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:211) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1056) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:264) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.3.jar:6.1.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
Caused by: org.elasticsearch.transport.TransportException: transport stopped, action: internal:gateway/local/started_shards[n]
        at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:263) ~[elasticsearch-6.1.3.jar:6.1.3]
        ... 5 more
    
{ ... }
    
[2020-10-28T08:49:58,547][INFO ][o.e.n.Node               ] [monitoring-1] stopped
[2020-10-28T08:49:58,657][INFO ][o.e.n.Node               ] [monitoring-1] closing ...
[2020-10-28T08:49:58,421][WARN ][o.e.g.GatewayAllocator$InternalPrimaryShardAllocator] [monitoring-1] [route53-2019.06.07][2]: failed to list shard for shard_started on node [CAeJ1WrERdaz56Z45ceZ-Q]
org.elasticsearch.action.FailedNodeException: Failed node [CAeJ1WrERdaz56Z45ceZ-Q]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:239) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:153) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:211) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1056) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:264) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.3.jar:6.1.3]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.3.jar:6.1.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
Caused by: org.elasticsearch.transport.TransportException: transport stopped, action: internal:gateway/local/started_shards[n]
        at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:263) ~[elasticsearch-6.1.3.jar:6.1.3]
        ... 5 more
    
{ ... }

I also notice this kind of error:

[WARN ][o.e.g.MetaStateService   ] [monitoring-1] [[route53-2019.01.06/w3OI-k1PQmiboIhtGaiCrA]]: failed to write index state
org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an external force: NativeFSLock(path=/data-persistent/elasticsearch/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive invalid],creationTime=2018-07-03T08:11:51.821555Z)
[WARN ][o.e.e.NodeEnvironment    ] [monitoring-1] lock assertion failed

Did I miss some operations? Do you have any hint on what's going on?

Thank you very much

DavidTurner · November 20, 2020, 9:39pm

That version passed the end of it supported life a year and a half ago. You should upgrade as a matter of urgency.

You have nearly 3000 indices (and who knows how many shards) on a single node. That's at least 10x as many as is recommended so it's not going to work well at all; moreover you only gave it a few minutes before stopping it again which won't be nearly long enough to recover everything. Here's a blog post that might help explain the problem:

The process you describe sounds correct to me.

sourcedump · November 23, 2020, 2:19pm

Hello @DavidTurner,
thanks for you recommendations. I understand that this is not a correct configuration of ES and that it's quite old. When I said 'works perfectly' I meant that there are no errors reported in logs or that we have noticed. This installation was a default one with a simple configuration.
I'll work on the upgrade then (and a better configuration).

Thanks again

system · December 21, 2020, 2:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help - Erro ES Elasticsearch	3	369	July 6, 2017
Multiple path.data and stripping Elasticsearch	7	1789	July 6, 2017
Move a one node index to a different machine Elasticsearch	3	1910	July 6, 2017
Elasticsearch issue Elasticsearch	13	2059	July 6, 2017
Total dataloss due to disk space issues Elasticsearch	8	471	July 6, 2017

Can't move ES indexes to another disk

Related topics