Why elasticsearch Downs After Reboot

Hello, dear community,

Recently we faced some critical errors in elasticsearch. The errors down below occurred after we rebooted the machines.

#Issue: Failed to Read keystore Password on Console

The problem was Elasticsearch. It wasn’t starting and kept failing. The error message was:

> [root@master-1 elasticsearch]# systemctl status elasticsearch 
> elasticsearch.service - Elasticsearch   
> Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)   
> Active: failed (Result: exit-code) since Sat 2021-08-14 00:51:01 BST; 29min ago     
> Docs: https://www.elastic.co  
> Process: 15432 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=1/FAILURE) 
> Main PID: 15432 (code=exited, status=1/FAILURE)
> journalctl -xe
> -- Unit elasticsearch.service has begun starting up. 
> Aug 14 00:40:11 master-1  systemd-entrypoint[15145]: Failed to read keystore password on console 
> Aug 14 00:40:11 master-1 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE 
> Aug 14 00:40:11 master-1 systemd[1]: elasticsearch.service: Failed with result 'exit-code'. 
> Aug 14 00:40:11 master-1 systemd[1]: Failed to start Elasticsearch. -- Subject: Unit elasticsearch.service has failed
> Aug 14 06:05:39 master-1 systemd-entrypoint[16920]: /usr/share/elasticsearch/bin/systemd-entrypoint: line 7: /root/tmp.tmp: Permission denied
> Aug 14 06:05:39 master-1 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
> Aug 14 06:05:39 master-1 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
> Aug 14 06:05:39 master-1 systemd[1]: Failed to start Elasticsearch.

#Issue: CorruptIndexException: codec header mismatch (elasticsearch-keystore)

> journalctl -xe
> 
> systemd-entrypoint[4842]: Exception in thread "main" org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual header=1071082590 vs expected header=10710>
> systemd-entrypoint[4842]:         at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:196)
> systemd-entrypoint[4842]:         at org.elasticsearch.common.settings.KeyStoreWrapper.load(KeyStoreWrapper.java:224)
> systemd-entrypoint[4842]:         at org.elasticsearch.common.settings.HasPasswordKeyStoreCommand.execute(HasPasswordKeyStoreCommand.java:42)
> Aug 15 08:15:01 cold-1 systemd-entrypoint[4842]:         at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
> systemd-entrypoint[4842]:         at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
> systemd-entrypoint[4842]:         at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:91)
> systemd-entrypoint[4842]:         at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
> systemd-entrypoint[4842]:         at org.elasticsearch.cli.Command.main(Command.java:90)
> systemd-entrypoint[4842]:         at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:43)
> systemd-entrypoint[4842]: Exception in thread "main" org.elasticsearch.bootstrap.BootstrapException: org.apache.lucene.index.CorruptIndexException: codec header mismatch: a>

We managed to fix these issues but we researched the community about it and we noticed that after rebooting the machines most of the community had similar/same errors. So here are my questions to the elasticsearch and the elastic community.

  1. Why and how did these errors happen?
  2. Is rebooting the machines affects elasticsearch badly?
  3. How can we prevent these errors to occur?
  4. Is elasticsearch has a future update plan about preventing/fixing these after reboot issues for future versions of elasticsearch?
  5. Is stopping elasticsearch before rebooting the machines helps prevent the errors?
  6. Last but not least the errors we got were mostly elasticsearch-keystore errors. Is rebooting affects elasticsearch-keystore badly? If so, how can we prevent this?

I'm looking forward to your replies, please make sure to inform us via this topic.

Welcome to our community! :smiley:

Can you post some of the logs directly from the file, rather than from journalctl?

Thank you @warkolm, loved the community!
Sorry for the late answer. I was looking into our logs. It has like 60k lines logs for a day :smile: anyways here's what I saw repeatedly:

I saw these 2 a lot.

> [2021-08-15T07:21:19,254][INFO ][o.e.x.s.a.AuthenticationService] [master-1] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]
> [2021-08-15T07:21:20,603][ERROR][o.e.x.s.a.e.ReservedRealm] [master-1] failed to retrieve password hash for reserved user [elastic]

Full of it:

> [2021-08-15T07:21:27,589][INFO ][o.e.x.s.a.AuthenticationService] [master-1] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]
> [2021-08-15T07:21:27,777][ERROR][o.e.x.s.a.e.ReservedRealm] [master-1] failed to retrieve password hash for reserved user [elastic]
> org.elasticsearch.action.UnavailableShardsException: at least one primary shard for the index [.security-7] is unavailable
> 	at org.elasticsearch.xpack.security.support.SecurityIndexManager.getUnavailableReason(SecurityIndexManager.java:181) ~[x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.esnative.NativeUsersStore.getReservedUserInfo(NativeUsersStore.java:525) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.esnative.ReservedRealm.getUserInfo(ReservedRealm.java:212) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.esnative.ReservedRealm.doAuthenticate(ReservedRealm.java:93) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticateWithCache(CachingUsernamePasswordRealm.java:167) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticate(CachingUsernamePasswordRealm.java:104) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$consumeToken$15(AuthenticationService.java:450) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.core.common.IteratingActionListener.run(IteratingActionListener.java:102) [x-pack-core-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeToken(AuthenticationService.java:495) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$extractToken$11(AuthenticationService.java:417) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.extractToken(AuthenticationService.java:427) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$checkForApiKey$3(AuthenticationService.java:368) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.ApiKeyService.authenticateWithApiKeyIfPresent(ApiKeyService.java:324) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.checkForApiKey(AuthenticationService.java:348) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:330) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.TokenService.getAndValidateToken(TokenService.java:396) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:326) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:386) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:397) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:321) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:263) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:141) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:126) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.xpack.security.rest.SecurityRestFilter.handleRequest(SecurityRestFilter.java:61) [x-pack-security-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:236) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:318) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:176) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:329) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:383) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:308) [elasticsearch-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:54) [transport-netty4-client-7.7.0.jar:7.7.0]
> 	at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:29) [transport-netty4-client-7.7.0.jar:7.7.0]
> 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:58) [transport-netty4-client-7.7.0.jar:7.7.0]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) [netty-codec-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [netty-handler-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar:4.1.45.Final]
> 	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar:4.1.45.Final]
> 	at java.lang.Thread.run(Thread.java:832)

I think it's irrelevant but after those authentication logs, without any authentication log I saw these repeatedly:

> 2021-08-15T07:21:29,686][WARN ][r.suppressed             ] [master-1] path: /.kibana/_doc/space%3Adefault, params: {index=.kibana, id=space:default}
> org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.kibana][_doc][space:default]: routing [null]]

Thanks already!

That's not great.
Do you have a snapshot you can restore from?

Yes we do, actually we haven't lost any data etc. We recovered it all.
We just wondered that why this happened and how can we prevent this type of errors?

Hard to say, we'd need to see logs from the master node around the time that this issue happened.