Elastic won't start on Windows after inserting a single document

lubosvr · November 2, 2021, 7:39am

Hi everybody,
I've just downloaded ES 7.15.1 and latest Kibana. I was able to start ES for the first time. Then I've added a single document and dtopped the Kibana and ES.
Now I'm not able to start ES. In the logs I see this error:

[2021-11-02T08:31:57,041][WARN ][o.e.c.r.a.AllocationService] [DESKTOP-ONTB2GB] failing shard [failed shard, shard [.kibana_task_manager_7.15.1_001][0], node[4vuRTRA5RmuN7_YZRoHFzA], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=TIHAjaqbQfWqZAnEqXkLGg], unassigned_info[[reason=CLUSTER_RECOVERED], at[2021-11-02T07:31:55.332Z], delayed=false, allocation_status[fetching_shard_data]], message [shard failure, reason [corrupt file (source: [start])]], failure [CorruptIndexException[codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=MMapIndexInput(path="C:\dev\tools\elasticsearch-7.15.1\data\nodes\0\indices\21kWgK9zTTu897fGTbeRuA\0\index\_15_Lucene84_0.doc"))]], markAsStale [true]]
org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=MMapIndexInput(path="C:\dev\tools\elasticsearch-7.15.1\data\nodes\0\indices\21kWgK9zTTu897fGTbeRuA\0\index\_15_Lucene84_0.doc"))
        at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:490) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.<init>(Lucene84PostingsReader.java:85) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.codecs.lucene84.Lucene84PostingsFormat.fieldsProducer(Lucene84PostingsFormat.java:438) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.<init>(PerFieldPostingsFormat.java:315) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:395) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:114) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:83) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:171) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:213) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:571) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:108) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:629) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:121) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:97) ~[lucene-core-8.9.0.jar:8.9.0 05c8a6f0163fe4c330e93775e8e91f3ab66a3f80 - mayyasharipova - 2021-06-10 17:50:37]
        at org.elasticsearch.index.engine.InternalEngine.createReaderManager(InternalEngine.java:635) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:257) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:198) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:14) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1766) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1732) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:421) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:87) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:434) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:85) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2034) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-7.15.1.jar:7.15.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]

I haven't changed anything in the configuration file.. Any idea what can cause this problem?
[Processing: elasticsearch.log...]()
[Processing: elasticsearch_deprecation.log...]()
[Processing: elasticsearch_server.json...]()
[Processing: gc.log...]()

Full logs can be found here: https://uloz.to/tamhle/KAONBI4cJwVt/name/Nahrano-2-11-2021-v-8-43-37#!ZJIwZmR2ZmIuLGuzA2SxAGIvAzL4BKEJq0bkFzIIpwE1JwLmBN==

stephenb · November 2, 2021, 3:37pm

To Me... It looks like you corrupted the actual underlying data files Elasticsearch uses.

How did you stop Elasticsearch?

Sometime if you just restart Elasticsearch several times and wait it will recover on its own, did you let it to continue to run? Did it actually come up?

could you run

curl localhost:9200

Did it respond?

If you really did just get started ... and I would never tell you to do this for any non dev env you could just clean out the data directory and Elasticsearch will rebuild it, that is a pretty extreme approach. NOTE THIS DELETES ALL DOCUMENTS AND DATA IN THE Elasticsearch CLUSTER

Clean out below this..

"C:\dev\tools\elasticsearch-7.15.1\data

a better approach is to let it come up and perhaps delete the corrupted index lets us know if Elasticsearch comes up and you can run the curl command above

lubosvr · November 3, 2021, 8:02am

Hi stephenb,
you are right, it looks like the data files are corrupted. I can always reproduce this.

I did it by pressing CTRL-C, I looked into elasticsearch.bat and haven't found a better way how to stop it.

I've tried to run the curl and here is the response:

C:\Users\Lubos>curl localhost:9200
{
  "name" : "DESKTOP-ONTB2GB",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "TpbIQ_f4T02mA4BWfOhmLA",
  "version" : {
    "number" : "7.15.1",
    "build_flavor" : "default",
    "build_type" : "zip",
    "build_hash" : "83c34f456ae29d60e94d886e455e6a3409bba9ed",
    "build_date" : "2021-10-07T21:56:19.031608185Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

It seems like alive, but I can not connect with Kibana. In console of Elastic I can see:

Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.kibana_7.15.1_001][0]]. Consider using `allow_partial_search_results` setting to bypass this error.
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:211) ~[elasticsearch-7.15.1.jar:7.15.1]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:424) [elasticsearch-7.15.1.jar:7.15.1]
        ... 82 more

In Kibana's console I can see:

  log   [08:40:31.567] [error][savedobjects-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 127.0.0.1:9200
  log   [08:40:39.297] [error][savedobjects-service] [.kibana] Action failed with 'connect ECONNREFUSED 127.0.0.1:9200'. Retrying attempt 8 in 64 seconds.

I've tried to start ES 3x times and always with the same result.
I tried to delete everything inside data folder and now it starts, but I've lost the document I inserted, that is not good at all.

Logs from this try: https://uloz.to/tamhle/NUOLytV9nURe/name/Nahrano-3-11-2021-v-9-00-29#!ZGSxMwR2AGuyMGWvZJMwZwAyATLlMzIFIJ42DmORIS9hAGLlMt==

stephenb · November 3, 2021, 3:05pm

First Apologies apparently I was not clear and I edited above if you delete the data directory that will delete ALL the Elasticsearch data, so do not do that unless you are OK with that. Since you only added 1 document I thought you could reload that document.

Now I think your Elasticsearch is OK looking at your logs it looks good but it also look like you shut t down I guess

[2021-11-03T08:55:21,287][INFO ][o.e.c.r.a.AllocationService] [DESKTOP-ONTB2GB] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana-event-log-7.15.1-000001][0]]]).

How many nodes are you running? Are you running elasitcsearch and kibana on the same desktop?

It appears you are having connectivity issues between kibana and Elasticsearch. Just to be clear... you are starting Elasticsearch and leaving Elasticsearch running and then try to start kibana right?

Did you change any setting in the Elasticsearch.yml or kibana.yml?

lubosvr · November 3, 2021, 4:42pm

I'm running Kibana and ES in single node on my laptop.
When I deleted the data dir I was able to connect, without the deletion I was not.
And yes. I always start ES and then after a while I start Kibana.

No, I made no changes. I've just run elasticsearch.bat and kibana.bat.

Do you have any idea why is the index always corrupted??

stephenb · November 3, 2021, 5:00pm

Hi @lubosvr

Please start elasticsearch wait and then run these 2 commands... and post the results

curl localhost:9200/_cat/health?v
curl localhost:9200/_cat/indices?v

I do not think the index is corrupted anymore.

lubosvr:

In Kibana's console I can see:

  log   [08:40:31.567] [error][savedobjects-service] Unable to retrieve version information from Elasticsearch nodes. connect ECONNREFUSED 127.0.0.1:9200
  log   [08:40:39.297] [error][savedobjects-service] [.kibana] Action failed with 'connect ECONNREFUSED 127.0.0.1:9200'. Retrying attempt 8 in 64 seconds.

This indicates kibana can not connect to elasticsearch nothing about corrupted index.

lubosvr · November 3, 2021, 5:03pm

The problem is that the index gets always corrupted when I restart ES..

stephenb · November 3, 2021, 5:04pm

Will you please run the commands...otherwise I can not help that will show if anything is corrupted

Perhaps you should uninstall and reinstall

Perhaps you have an issues with your disk.

When you stop Elasticsearch are you hitting Control C multiple times or just once. You should just do it once... and let it shutdown.

lubosvr · November 3, 2021, 7:56pm

Here is the output before the restart

C:\Users\Lubos>curl localhost:9200/_cat/health?v
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1635968609 19:43:29  elasticsearch yellow          1         1      8   8    0    0        1             0                  -                 88.9%

C:\Users\Lubos>curl localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases                pqVX2AdqQuaCh4MdoW3MIw   1   0         41            0     39.6mb         39.6mb
green  open   .kibana_7.15.1_001              OSLtfOJOQjqPpo5qdPhQXg   1   0         21            1      2.3mb          2.3mb
green  open   .apm-custom-link                CZlVKF0PQe2Yc7lWzdMv6w   1   0          0            0       208b           208b
green  open   .kibana-event-log-7.15.1-000001 XbfKue61SuujnX3oArU3Fg   1   0          1            0      5.9kb          5.9kb
green  open   .apm-agent-configuration        P4qRuSkTTGCKuk7ID8jsuQ   1   0          0            0       208b           208b
yellow open   blogposts                       9LHpq1BiQ9GYfFZowAeG_A   1   1          1            0      5.8kb          5.8kb
green  open   .kibana_task_manager_7.15.1_001 3sOBcBRmRxWCdDzVlLWfcA   1   0         15          169      140kb          140kb

And here is the output after the restart

C:\Users\Lubos>curl localhost:9200/_cat/health?v
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1635968863 19:47:43  elasticsearch red             1         1      7   7    0    0        2             0                  -                 77.8%

C:\Users\Lubos>curl localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases                pqVX2AdqQuaCh4MdoW3MIw   1   0         41            0     39.6mb         39.6mb
green  open   .kibana_7.15.1_001              OSLtfOJOQjqPpo5qdPhQXg   1   0         21            1      2.3mb          2.3mb
green  open   .apm-custom-link                CZlVKF0PQe2Yc7lWzdMv6w   1   0          0            0       208b           208b
green  open   .apm-agent-configuration        P4qRuSkTTGCKuk7ID8jsuQ   1   0          0            0       208b           208b
yellow open   blogposts                       9LHpq1BiQ9GYfFZowAeG_A   1   1          1            0      5.9kb          5.9kb
green  open   .kibana-event-log-7.15.1-000001 XbfKue61SuujnX3oArU3Fg   1   0          2            0     11.8kb         11.8kb
red    open   .kibana_task_manager_7.15.1_001 3sOBcBRmRxWCdDzVlLWfcA   1   0

I've installed the ES just by extracting the ZIP file, I've tried to extract it to a different folder and the behaviour is the same..

I'm always pressing the CTRL-C only once, waiting to shutdown. It asks me if I want to terminate and I press Y like Yes-terminate

[2021-11-03T20:54:50,921][INFO ][o.e.n.Node               ] [DESKTOP-ONTB2GB] closed
Terminate batch job (Y/N)?

How can I check the disk? I saw no issues with the disk till now..

stephenb · November 3, 2021, 8:47pm

I don't think it's your disk.

I think Elasticsearch is not shutting down gracefully.

You should stop kibana all the way first before stopping Elasticsearch

Technically it is a good idea then to run a flush sync this makes sure all thh data is written to disk

curl -X POST localhost:9200/_flush/synced

answer should look like
{"_shards":{"total":33,"successful":20,"failed":0},".....

then shut down Elasticsearch.

I'm not the biggest windows expert what happens if you answer no to the terminate question?

lubosvr · November 4, 2021, 8:29am

Yes, it makes sense with the not shutting down gracefully.
I'm always stopping Kibana first and when it's stopped I'm stopping ES.

I've tried the flush command, but with no luck.

first try ended with an error:

C:\Users\Lubos>curl -X POST localhost:9200/_flush/synced
{"_shards":{"total":7,"successful":5,"failed":1},".kibana_7.15.1_001":{"total":1,"successful":1,"failed":0},".apm-custom-link":{"total":1,"successful":1,"failed":0},"blogposts":{"total":2,"successful":1,"failed":0},".kibana-event-log-7.15.1-000001":{"total":1,"successful":1,"failed":0},".apm-agent-configuration":{"total":1,"successful":1,"failed":0},".kibana_task_manager_7.15.1_001":{"total":1,"successful":0,"failed":1,"failures":[{"shard":0,"reason":"pending operations","routing":{"state":"STARTED","primary":true,"node":"cbCWMB2xQtaDyKLegO55nw","relocating_node":null,"shard":0,"index":".kibana_task_manager_7.15.1_001","allocation_id":{"id":"RcdhlMyPRsaLMJtO8hlZLg"}}}]}}

try was successful

C:\Users\Lubos>curl -X POST localhost:9200/_flush/synced
{"_shards":{"total":7,"successful":6,"failed":0},".kibana_7.15.1_001":{"total":1,"successful":1,"failed":0},".apm-custom-link":{"total":1,"successful":1,"failed":0},"blogposts":{"total":2,"successful":1,"failed":0},".kibana-event-log-7.15.1-000001":{"total":1,"successful":1,"failed":0},".apm-agent-configuration":{"total":1,"successful":1,"failed":0},".kibana_task_manager_7.15.1_001":{"total":1,"successful":1,"failed":0}}

But next start failed again

C:\Users\Lubos>curl localhost:9200/_cat/health?v
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1636012406 07:53:26  elasticsearch red             1         1      7   7    0    0        2             0                  -                 77.8%

C:\Users\Lubos>curl localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana_7.15.1_001              xUxU6xhiTNOYK9YmpeeFyw   1   0         12            0      2.3mb          2.3mb
green  open   .geoip_databases                5_DV-R_vQ-SizTHIN2G_Gg   1   0         41            0     39.6mb         39.6mb
green  open   .apm-custom-link                cYuinSTzTl6p7qUIcAUrfg   1   0          0            0       261b           261b
green  open   .kibana-event-log-7.15.1-000001 eI_eGjK7RlKdCGtHXw_piw   1   0          2            0     11.9kb         11.9kb
yellow open   blogposts                       XizrnHPJQyy0JSGI8N42GQ   1   1          1            0      5.9kb          5.9kb
green  open   .apm-agent-configuration        ZwtMbsKTSEyNQOpjZjnJQA   1   0          0            0       261b           261b
red    open   .kibana_task_manager_7.15.1_001 GyOUInhiScS2RjLtxmLqgQ   1   0

Logs: https://uloz.to/tamhle/s0hj5q2R9Jhh/name/Nahrano-4-11-2021-v-9-01-34#!ZJWyAGR1AzDmMJV3BGDlZGLkMzMwnTuXBJcInzI6M1MiZGNk

This time I tried to answer N to the question Terminate batch job (Y/N)? and ES stopped.
It seems it has no effect.

I tried to stop Kibana and start it again, with not stopping ES and it works fine. Wrong stopping of ES seems to be the problem.

DavidTurner · November 4, 2021, 11:07am

I don't think it's this at all. If Elasticsearch shut down abruptly in the middle of writing a file then it would not be trying to read it again. Segment files like the .doc file mentioned above are only read if referenced by a segments_N file, and the segments_N file is written strictly after all the files on which it depends are successfully written, so it can't get into a state where it needs to read an incomplete file no matter how it crashes.

I think it is much more likely that something (not Elasticsearch) is altering the contents of this file after it's been written.

lubosvr · November 4, 2021, 1:00pm

Well. I found it strange, that something is corrupting olny the ES node files and nothing else. I have troubles only with ES and only when ES is going down and up again.

I've tried to switch off my windows defender, but the problem still remains.
To me it looks ES is not working correctly on my MS laptop

stephenb · November 4, 2021, 2:24pm

@DavidTurner thanks for the explanation. I was pretty sure we have robust handling of the data files just at a bit of a loss, it is unclear what is disturbing that index / segment.

It is a bit odd that it's the same index that's corrupted each time.

@lubosvr perhaps try with docker/ docker-compose?

lubosvr · November 4, 2021, 7:00pm

I've tried docker compose with the mapped folder (btw the same as I used on Windows) and it works perfectly. Thanks @stephenb !

I've marked your response as a solution but I'd rather mark it as a workaround. I still suspect the showdown process is not working correctly on windows.

system · December 2, 2021, 7:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Total dataloss due to disk space issues Elasticsearch	8	471	July 6, 2017
Error: CorruptIndexException when reading from gateway Elasticsearch	5	970	July 6, 2017
Index corruption on cluster restart Elasticsearch	3	1339	July 6, 2017
Recover corrupt/not working elasticsearch data Elasticsearch	7	2077	April 10, 2019
Elasticsearch being unstable Elasticsearch	4	1501	July 6, 2017

Elastic won't start on Windows after inserting a single document

Related topics