Cluster state update task

yahuu · April 4, 2019, 9:50am

Hello everyone,

when i start up my docker containers for my ELK-Stack my Elasticsearch does take some time for some cluster state update tasks.

What are these tasks doing?
I only have one node and no cluster. Are they necessary as they slow down the startup tremendously.

This is a example log message:

[WARN ][o.e.c.s.MasterService ] [adzrBlG] cluster state update task [shard-started StartedShardEntry{shardId [[logstash2-2018.12][1]], allocationId [8Cj1XFgoSF6ms4q1E4Q0Bw], message [after existing store recovery; bootstrap_history_uuid=false]}[StartedShardEntry{shardId [[logstash2-2018.12][1]], allocationId [8Cj1XFgoSF6ms4q1E4Q0Bw], message [after existing store recovery; bootstrap_history_uuid=false]}], shard-started StartedShardEntry{shardId [[logstash-2019.10-02][0]], allocationId [11aBkGwjQvKAuE2IO1qMGw], message [after existing store recovery; bootstrap_history_uuid=false]}[StartedShardEntry{shardId [[logstash-2019.10-02][0]], allocationId [11aBkGwjQvKAuE2IO1qMGw], message [after existing store recovery; bootstrap_history_uuid=false]}]...

Thank you in advance,
Yahuu

DavidTurner · April 4, 2019, 10:23am

Yes. The particular task you quoted is starting some shards so that they can be searched, which seems important to me. But they shouldn't really take very long.

Can you share the output of GET _cluster/health?

yahuu · April 4, 2019, 10:42am

Right now i restarted and the health is red

{
"cluster_name" : "docker-cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 58,
"active_shards" : 58,
"relocating_shards" : 0,
"initializing_shards" : 20,
"unassigned_shards" : 95,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 17,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 378096,
"active_shards_percent_as_number" : 33.52601156069364
}

After some time the shards get assigned and the status changes to green.
I feel that it takes to much time. I switched from my old node (has an older elasticsearch version) to docker and i didn't have that problem on the old node.

I am unsure what the problem is.

DavidTurner · April 4, 2019, 10:45am

Nor I. It should not take nearly this long to start this cluster. Some tasks have been waiting longer than 6 minutes to complete.

Are you using local disks for storage, or are you using something network-attached?

Can you share all the logs that the node writes from when it starts until when the cluster is fully green? Use https://gist.github.com/ as it'll be too large for a forum post.

yahuu · April 4, 2019, 1:06pm

I am using a local disk.

Now I only started elasticsearch and kibana without logstash and the cluster state update task didn't show up again.

But still it takes around 8mins for the cluster health to change to "yellow" (Yellow because in some nodes i have set replicas to 1 but i only have one node).
Is that normal?

Here is my logs from the startup:

gist.github.com

https://gist.github.com/Yahuu261/7fc487297b48c65ed87245bedcd53601

startup.log

2019-04-04 12:42:38,916 main DEBUG Not in a ServletContext environment, thus not loading WebLookup plugin.
2019-04-04 12:42:38,921 main DEBUG Starting LoggerContext[name=6e5e91e4, org.apache.logging.log4j.core.LoggerContext@130e116b] with configuration org.apache.logging.log4j.core.config.composite.CompositeConfiguration@e383572 [configurations=[org.apache.logging.log4j.core.config.properties.PropertiesConfiguration@5ddf0d24], mergeStrategy=org.apache.logging.log4j.core.config.composite.DefaultMergeStrategy@363a52f, rootNode=null, listeners=[], pluginPackages=[], pluginManager=org.apache.logging.log4j.core.config.plugins.util.PluginManager@60856961, isShutdownHookEnabled=true, shutdownTimeoutMillis=0, scriptManager=null]...
2019-04-04 12:42:38,922 main DEBUG Apache Log4j Core 2.11.1 initializing configuration org.apache.logging.log4j.core.config.composite.CompositeConfiguration@e383572 [configurations=[org.apache.logging.log4j.core.config.properties.PropertiesConfiguration@5ddf0d24], mergeStrategy=org.apache.logging.log4j.core.config.composite.DefaultMergeStrategy@363a52f, rootNode=null, listeners=[org.apache.logging.log4j.core.LoggerContext@130e116b], pluginPackages=[], pluginManager=org.apache.logging.log4j.core.config.plugins.util.PluginManager@60856961, isShutdownHookEnabled=true, shutdownTimeoutMillis=0, scriptManager=null]
2019-04-04 12:42:38,922 main DEBUG Installed 1 script engine
2019-04-04 12:42:39,605 main DEBUG Oracle Nashorn version: 11.0.1, language: ECMAScript, threading: Not Thread Safe, compile: true, names: [nashorn, Nashorn, js, JS, JavaScript, javascript, ECMAScript, ecmascript], factory class: jdk.nashorn.api.scripting.NashornScriptEngineFactory
2019-04-04 12:42:39,612 main DEBUG PluginManager 'Core' found 117 plugins
2019-04-04 12:42:39,613 main DEBUG PluginManager 'Level' found 0 plugins
2019-04-04 12:42:39,617 main DEBUG PluginManager 'Lookup' found 13 plugins
2019-04-04 12:42:39,627 main DEBUG Building Plugin[name=AppenderRef, class=org.apache.logging.log4j.core.config.AppenderRef].
2019-04-04 12:42:39,636 main DEBUG PluginManager 'TypeConverter' found 26 plugins

This file has been truncated. show original

Do you think it has something to do with not sufficient resources? Or maybe the size or number of shards? But i had the same size shards on my old node and it wasn't such a big problem there. Querying also takes a bit longer than before i feel like.

If you need some other input let me know.

DavidTurner · April 4, 2019, 1:16pm

No, it's definitely not normal for this to take 8 minutes. I was hoping for some kind of logged message indicating what was running slowly but I see only silence for those 8 minutes.

Could you add logger.org.elasticsearch.cluster.service: TRACE to the elasticsearch.yml file and restart it? That should start to tell us a bit more about what's happening in that missing time.

Could you also get some thread dumps by calling GET _nodes/hot_threads?threads=9999 a few times during those 8 minutes?

The only slightly strange thing I noticed in your log was that the JVM options contain -Xms1g, -Xmx1g, -Xmx10g and -Xms10g. It's picking up the larger, 10g values, but you probably want to get rid of the 1g ones to avoid confusion in future.

yahuu · April 4, 2019, 1:49pm

The only slightly strange thing I noticed in your log was that the JVM options contain -Xms1g , -Xmx1g , -Xmx10g and -Xms10g . It's picking up the larger, 10g values, but you probably want to get rid of the 1g ones to avoid confusion in future.

Yes, i noticed that too and was a bit confused. I set it in my docker-compose.yml as environment variable. But i see it using 10g of memory.

Logs with logger.org.elasticsearch.cluster.service: TRACE in startup.log
I put GET _nodes/hot_threads?threads=9999 responses in hot_threads.txt.

gist.github.com

https://gist.github.com/Yahuu261/0df6d0e8e7016a2677c28b6d097fea79

hot_threads.txt

::: {adzrBlG}{adzrBlGBQGGQn8ykXn-lyQ}{U05RzqMtTZmJgGOu2-kK0A}{172.18.0.2}{172.18.0.2:9300}
   Hot threads at 2019-04-04T13:34:06.774, interval=500ms, busiestThreads=9999, ignoreIdleThreads=true:
   
    2.4% (11.8ms out of 500ms) cpu usage by thread 'elasticsearch[adzrBlG][generic][T#4]'
     10/10 snapshots sharing following 27 elements
       app//org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:83)
       app//org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:172)
       app//org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:214)
       app//org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:106)
       app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:524)

This file has been truncated. show original

startup.log

2019-04-04 13:30:06,663 main DEBUG Not in a ServletContext environment, thus not loading WebLookup plugin.
2019-04-04 13:30:06,669 main DEBUG Starting LoggerContext[name=6e5e91e4, org.apache.logging.log4j.core.LoggerContext@4b41dd5c] with configuration org.apache.logging.log4j.core.config.composite.CompositeConfiguration@3b96c42e [configurations=[org.apache.logging.log4j.core.config.properties.PropertiesConfiguration@5d066c7d], mergeStrategy=org.apache.logging.log4j.core.config.composite.DefaultMergeStrategy@1e461e41, rootNode=null, listeners=[], pluginPackages=[], pluginManager=org.apache.logging.log4j.core.config.plugins.util.PluginManager@5ba88be8, isShutdownHookEnabled=true, shutdownTimeoutMillis=0, scriptManager=null]...
2019-04-04 13:30:06,670 main DEBUG Apache Log4j Core 2.11.1 initializing configuration org.apache.logging.log4j.core.config.composite.CompositeConfiguration@3b96c42e [configurations=[org.apache.logging.log4j.core.config.properties.PropertiesConfiguration@5d066c7d], mergeStrategy=org.apache.logging.log4j.core.config.composite.DefaultMergeStrategy@1e461e41, rootNode=null, listeners=[org.apache.logging.log4j.core.LoggerContext@4b41dd5c], pluginPackages=[], pluginManager=org.apache.logging.log4j.core.config.plugins.util.PluginManager@5ba88be8, isShutdownHookEnabled=true, shutdownTimeoutMillis=0, scriptManager=null]
2019-04-04 13:30:06,671 main DEBUG Installed 1 script engine
2019-04-04 13:30:07,405 main DEBUG Oracle Nashorn version: 11.0.1, language: ECMAScript, threading: Not Thread Safe, compile: true, names: [nashorn, Nashorn, js, JS, JavaScript, javascript, ECMAScript, ecmascript], factory class: jdk.nashorn.api.scripting.NashornScriptEngineFactory
2019-04-04 13:30:07,405 main DEBUG PluginManager 'Core' found 117 plugins
2019-04-04 13:30:07,405 main DEBUG PluginManager 'Level' found 0 plugins
2019-04-04 13:30:07,409 main DEBUG PluginManager 'Lookup' found 13 plugins
2019-04-04 13:30:07,409 main DEBUG Building Plugin[name=AppenderRef, class=org.apache.logging.log4j.core.config.AppenderRef].
2019-04-04 13:30:07,435 main DEBUG PluginManager 'TypeConverter' found 26 plugins

This file has been truncated. show original

DavidTurner · April 4, 2019, 2:38pm

Thanks. From what I can see this looks like a normal startup: the node is busy starting up all the shards, reading data from disk etc. Can you tell us a bit more about the cluster, i.e. how many indices and shards you have, and how much data there is in total?

yahuu · April 4, 2019, 3:59pm

I actually switched from a SSD to HDD. I made some space and will switch to a SSD again and see if that may change anything. I will test it next week and update here.

I have around 125GB of data.
Here is my output for the indices info:

gist.github.com

https://gist.github.com/Yahuu261/889422d9a0bf43710c208c1f0e2669f5

indices

health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_1                 As6x9vOHTAKdCyNpVnY4ng   1   0        109            1    178.9kb        178.9kb
green  open   logstash-2019.07          L69kpI3rSCmYl8wdM31OUg   5   0    1493691            0    903.4mb        903.4mb
green  open   logstash-2018.43          nKRGjz1CSRSLfv3zEcfzWg   5   0    4689861       892303        2gb            2gb
green  open   logstash-2018.38          0k_Nlta_R4ayOxPEEAihpQ   5   0   10341750      2826515      3.9gb          3.9gb
green  open   logstash-2019.09          kZyS4tbBSUyJ0cdaeLYKwA   5   0    1170034            0    661.1mb        661.1mb
green  open   logstash-2018.34          1iBwqYXQTPKdydXbwc6-hA   5   0     100717          911     50.5mb         50.5mb
green  open   logstash-until-2018.08.23 7IQpwsPwSTeNCvoAMpWceA   5   0   59538700     17702443     32.3gb         32.3gb
green  open   logstash-2019.08          rzRNfc6CRse9VgJFGmM8cA   5   0    1628780            0      1.2gb          1.2gb
green  open   logstash-2018.36          r3RKZeBmSr2DVWjFZUBi-Q   5   0   15189186      3671494     10.2gb         10.2gb

This file has been truncated. show original

I plan to actually reindex from weekly indices to monthly indices, because I have a lot of small indices. Is that a good idea?

DavidTurner · April 4, 2019, 4:03pm

That could well explain the difference. Starting Elasticsearch can be quite I/O intensive, so I'm not surprised that a spinning disk is slower.

Yes, these indices are mostly very small, and fewer larger indices would be preferable. I also think you should consider reducing the number of shards in each index from 5 to 1.

yahuu · April 8, 2019, 4:23pm

Thank you very much for the help so far David

The SSD changed a lot. It now takes about 2-3 minutes for the node to be ready. Is that more reasonable now right?

I tried to shrink my indices but ran into problems.
The new index created by the shrink operation is marked as red.

GET /_cluster/allocation/explain gave following response:

{
  "index" : "logstash-2018.35_shrinked",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2019-04-08T15:59:48.615Z",
    "failed_allocation_attempts" : 1,
    "details" : "failed shard on node [MEqShQSoRoqabapHo385Fw]: failed to create shard, failure IndexNotFoundException[no such index]",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "MEqShQSoRoqabapHo385Fw",
      "node_name" : "MEqShQS",
      "transport_address" : "172.18.0.2:9300",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [1] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-04-08T15:59:48.615Z], failed_attempts[1], delayed=false, details[failed shard on node [MEqShQSoRoqabapHo385Fw]: failed to create shard, failure IndexNotFoundException[no such index]], allocation_status[deciders_no]]]"
        }
      ]
    }
  ]
}

This is what i found in the logs:

gist.github.com

https://gist.github.com/Yahuu261/da70f7f70483d5cb7bf21d86da0d3ca2

shrink.log

    elasticsearch_1  | [2019-04-08T16:07:56,775][WARN ][o.e.c.r.a.AllocationService] [MEqShQS] failing shard [failed shard, shard [logstash-2018.35_shrinked][0], node[MEqShQSoRoqabapHo385Fw], [P], recovery_source[local shards recovery], s[INITIALIZING], a[id=xakF1FL9Tq6TNBi_XB6E8w], unassigned_info[[reason=INDEX_CREATED], at[2019-04-08T16:07:54.691Z], delayed=false, allocation_status[no_attempt]], expected_shard_size[409130838], message [failed recovery], failure [RecoveryFailedException[[logstash-2018.35_shrinked][0]: Recovery failed on {MEqShQS}{MEqShQSoRoqabapHo385Fw}{q8P9VQIuRfeFn6CJejk5gg}{172.18.0.2}{172.18.0.2:9300}]; nested: IndexShardRecoveryException[failed to recover from local shards]; nested: FileSystemException[/usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs: Invalid argument]; ], markAsStale [true]]
    elasticsearch_1  | org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2018.35_shrinked][0]: Recovery failed on {MEqShQS}{MEqShQSoRoqabapHo385Fw}{q8P9VQIuRfeFn6CJejk5gg}{172.18.0.2}{172.18.0.2:9300}
    elasticsearch_1  |      at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2197) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    elasticsearch_1  |      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    elasticsearch_1  |      at java.lang.Thread.run(Thread.java:834) [?:?]
    elasticsearch_1  | Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from local shards
    elasticsearch_1  |      at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromLocalShards$3(StoreRecovery.java:138) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.4.jar:6.5.4]

This file has been truncated. show original

FileSystemException was thrown.
I already retried the allocation with POST /_cluster/reroute?retry_failed=true. Same result.
There also should be enough storage left... I am not sure what to problem is.

DavidTurner · April 9, 2019, 6:47am

Nor I:

FileSystemException[/usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs: Invalid argument

Sounds like something went wrong reading this file, or maybe writing it, but I've never seen this before. Shrinking relies on hard-linking the index files, but most file systems support this just fine.

Try deleting the shrunk index logstash-2018.35_shrinked and shrinking it again, and if it fails for the same reason then it'd be good if you could find the full stack trace containing that FileSystemException and share it here.

yahuu · April 11, 2019, 12:46pm

I tried deleting and shrink again. Also tried with other shards.

The stacktrace should be at line 15+

gist.github.com

https://gist.github.com/Yahuu261/da70f7f70483d5cb7bf21d86da0d3ca2

shrink.log

    elasticsearch_1  | [2019-04-08T16:07:56,775][WARN ][o.e.c.r.a.AllocationService] [MEqShQS] failing shard [failed shard, shard [logstash-2018.35_shrinked][0], node[MEqShQSoRoqabapHo385Fw], [P], recovery_source[local shards recovery], s[INITIALIZING], a[id=xakF1FL9Tq6TNBi_XB6E8w], unassigned_info[[reason=INDEX_CREATED], at[2019-04-08T16:07:54.691Z], delayed=false, allocation_status[no_attempt]], expected_shard_size[409130838], message [failed recovery], failure [RecoveryFailedException[[logstash-2018.35_shrinked][0]: Recovery failed on {MEqShQS}{MEqShQSoRoqabapHo385Fw}{q8P9VQIuRfeFn6CJejk5gg}{172.18.0.2}{172.18.0.2:9300}]; nested: IndexShardRecoveryException[failed to recover from local shards]; nested: FileSystemException[/usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs: Invalid argument]; ], markAsStale [true]]
    elasticsearch_1  | org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2018.35_shrinked][0]: Recovery failed on {MEqShQS}{MEqShQSoRoqabapHo385Fw}{q8P9VQIuRfeFn6CJejk5gg}{172.18.0.2}{172.18.0.2:9300}
    elasticsearch_1  |      at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2197) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    elasticsearch_1  |      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    elasticsearch_1  |      at java.lang.Thread.run(Thread.java:834) [?:?]
    elasticsearch_1  | Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from local shards
    elasticsearch_1  |      at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromLocalShards$3(StoreRecovery.java:138) ~[elasticsearch-6.5.4.jar:6.5.4]
    elasticsearch_1  |      at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-6.5.4.jar:6.5.4]

This file has been truncated. show original

DavidTurner · April 11, 2019, 12:54pm

Thanks, at least that tells us a bit more. Does this file exist? /usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs. Is it readable (e.g. try sha1sum /usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs). Does dmesg show any kernel messages about filesystem problems?

yahuu · April 11, 2019, 1:02pm

sha1sum response:
sha1sum: /usr/share/elasticsearch/data/nodes/0/indices/iI_tnAUtSri2X3P-4sm18w/0/index/_7.cfs: No such file or directory

dmesg doesn't work right now. I'll try to fix it to see the kernel messages
dmesg: read kernel buffer failed: Operation not permitted

DavidTurner · April 11, 2019, 1:06pm

Oh hang on the logs you shared above are dated 2019-04-08T16:07:56,775, a few days ago. There should be more recent messages from your later attempt at the shrink.

yahuu · April 11, 2019, 1:08pm

Yes. It is the old one but the ones from today are the same:

[2019-04-11T12:31:04,944][WARN ][o.e.i.c.IndicesClusterStateService] [-BmHCz1] [[logstash-2018.35_shrinked6][0]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [logstash-2018.35_shrinked6][0]: Recovery failed on {-BmHCz1}{-BmHCz1qSZ2qqCUw33JjUw}{NWDdRyRmR6Kd7_6SNmQIUA}{172.18.0.2}{172.18.0.2:9300}
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$12(IndexShard.java:2455) [elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.7.1.jar:6.7.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from local shards
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromLocalShards$3(StoreRecovery.java:140) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromLocalShards(StoreRecovery.java:124) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromLocalShards(IndexShard.java:1725) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$12(IndexShard.java:2450) ~[elasticsearch-6.7.1.jar:6.7.1]
... 4 more
Caused by: java.nio.file.FileSystemException: /usr/share/elasticsearch/data/nodes/0/indices/IrvCj5v7T1OKKdj-a1HHjg/0/index/_7.cfs: Invalid argument
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:182) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:292) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:345) ~[?:?]
at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:460) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:332) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:287) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:84) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4824) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3297) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3465) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3430) ~[lucene-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 - jimczi - 2019-02-04 23:16:28]
at org.elasticsearch.index.shard.StoreRecovery.addIndices(StoreRecovery.java:189) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromLocalShards$3(StoreRecovery.java:132) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:310) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.StoreRecovery.recoverFromLocalShards(StoreRecovery.java:124) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.recoverFromLocalShards(IndexShard.java:1725) ~[elasticsearch-6.7.1.jar:6.7.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$12(IndexShard.java:2450) ~[elasticsearch-6.7.1.jar:6.7.1]
... 4 more

DavidTurner · April 11, 2019, 1:09pm

Ok, same questions about this file

/usr/share/elasticsearch/data/nodes/0/indices/IrvCj5v7T1OKKdj-a1HHjg/0/index/_7.cfs

Does it exist? What does stat /usr/share/elasticsearch/data/nodes/0/indices/IrvCj5v7T1OKKdj-a1HHjg/0/index/_7.cfs say? How about sha1sum /usr/share/elasticsearch/data/nodes/0/indices/IrvCj5v7T1OKKdj-a1HHjg/0/index/_7.cfs?

yahuu · April 11, 2019, 1:16pm

Ah sorry about the old log...

Same response again for both: No such file or directory

yahuu · April 25, 2019, 3:07pm

I wasn't able to resolve my problem to shrink my indices.
Thank you David for your time and help anyway

However what i did is to reindex the index to a new index with one shard.

Topic		Replies	Views
Killing pending tasks Elasticsearch	4	1636	November 26, 2021
What does "re-syncing mappings with cluster state for types" on startup mean? Elasticsearch	6	1506	July 6, 2017
TimeTaken by a Cluster State Update Task Elasticsearch	12	7242	July 5, 2017
Cluster turning into green state takes long time Elasticsearch	9	1479	November 4, 2019
Docker OS update and ES cluster state Elasticsearch docker	1	396	January 16, 2019

Cluster state update task

Related topics