Unable to bringup up 3 node, 300gb elasticsearch setup from docker volume

Hi all,

Iam trying to bringup elasticsearch 3 node setup with default settings which has 300gb data on a single index, I copied data volume of elasticsearch names => es01,es02,es03 from 1 machine to another machine and iam running it there using docker compose, but im facing this issue. Can someone pls help.

{"@timestamp":"2023-04-12T14:24:28.570Z", "log.level": "WARN", "message":"[1950b00985_ipdr][0] marking and sending shard failed due to [failed recovery]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es03][generic][T#7]","log.logger":"org.elasticsearch.indices.cluster.IndicesClusterStateService","elasticsearch.cluster.uuid":"F4EDDFruTzO3v3HwP9oYOg","elasticsearch.node.id":"YT9x_Fw6R6q1ulmi5r5JDA","elasticsearch.node.name":"es03","elasticsearch.cluster.name":"gms-elastic-docker-cluster","error.type":"org.elasticsearch.indices.recovery.RecoveryFailedException","error.message":"[1950b00985_ipdr][0]: Recovery failed on {es03}{YT9x_Fw6R6q1ulmi5r5JDA}{c79-QGYPSM2lZixPzmApmw}{es03}{172.27.0.5}{172.27.0.5:9300}{cdfhilmrstw}{ml.allocated_processors_double=16.0, xpack.installed=true, ml.machine_memory=999997440, ml.allocated_processors=16, ml.max_jvm_size=402653184}","error.stack_trace":"org.elasticsearch.indices.recovery.RecoveryFailedException: [1950b00985_ipdr][0]: Recovery failed on {es03}{YT9x_Fw6R6q1ulmi5r5JDA}{c79-QGYPSM2lZixPzmApmw}{es03}{172.27.0.5}{172.27.0.5:9300}{cdfhilmrstw}{ml.allocated_processors_double=16.0, xpack.installed=true, ml.machine_memory=999997440, ml.allocated_processors=16, ml.max_jvm_size=402653184}\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$24(IndexShard.java:3123)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:170)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:377)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:170)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:447)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:88)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2264)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\nCaused by: [1950b00985_ipdr/Yltkqus-Q0WQJ9BKHTM6pg][[1950b00985_ipdr][0]] org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:468)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:90)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:444)\n\t... 8 more\nCaused by: [1950b00985_ipdr/Yltkqus-Q0WQJ9BKHTM6pg][[1950b00985_ipdr][0]] org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:239)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:194)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:14)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1949)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1913)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:463)\n\t... 10 more\nCaused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=ByteBufferIndexInput(path="/usr/share/elasticsearch/data/indices/Yltkqus-Q0WQJ9BKHTM6pg/0/index/_osi.cfs"))\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:584)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:534)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.codecs.lucene90.Lucene90CompoundReader.(Lucene90CompoundReader.java:87)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat.getCompoundReader(Lucene90CompoundFormat.java:85)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:1245)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1263)\n\tat org.apache.lucene.core@9.4.2/org.apache.lucene.index.IndexWriter.(IndexWriter.java:1118)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2382)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngine.createWriter(InternalEngine.java:2370)\n\tat org.elasticsearch.server@8.6.2/org.elasticsearch.index.engine.InternalEngine.(InternalEngine.java:232)\n\t... 15 more\n"}

This is not supported and will not work sorry to say.

Hi warkolm, thanks for ans, May I ask why is it not supported, it is just a docker volume. Elasticsearch should be independent of that right?

Technically this will work as long as you copy the data volumes while the cluster is completely shut down. The simplest explanation for this error is that the cluster was still running while copying data, and that definitely will not work. If the cluster was shut down when you made the copy then you have some other problem in your storage instead, see these docs for more details.

Hi David, Cluster was down when i took the copy, but once while running disk was exhausted, and while the cluster was up I increased the disk, Could that lead to something like this.

The only way that would explain this problem is if your storage does not implement fsync() correctly. That's certainly possible, many storage systems prefer performance over correctness, but that's not something Elasticsearch can deal with.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.