Elasticsearch not starting after UID and GID change

Hello! I have a cluster of 2 nodes and I wanted to do the snapshot by setting up the NFS server and a directory as the repository. I had read that all nodes should have the same UID and GID. Thus, I changed them using commands below on both nodes:

usermod -u 1020 elasticsearch
groupmod -g 1030 elasticsearch

but I am not able to restart elasticsearch. Can someone please help me with this?

Do you see any errors in logs etc.? Not sure we can help much without seeing a bit more detail.

Did you update the ownership of all of Elasticsearch's data too?

Hi! I will post more details as soon as I can. I have used these commands to change the ownership on both nodes:

chown -R elasticsearch:elasticsearch /etc/elasticsearch

chown -R elasticsearch:elasticsearch /var/lib/elasticsearch

chown -R elasticsearch:elasticsearch /var/log/elasticsearch

chown -R elasticsearch:elasticsearch /usr/share/elasticsearch

I have tested root:elasticsearch on these directories as well. Still not restarting.

These are my last log lines in node-1:

[2021-02-22T13:11:47,923][WARN ][r.suppressed             ] [node-1] path: /_snapshot/test, params: {pretty=true, repository=test}
org.elasticsearch.repositories.RepositoryVerificationException: [test] [[VNjwVdvSTqSzW5Ggxn3k0A, 'RemoteTransportException[[node-2][192.168.xx.xxx:9300][inte$
        at org.elasticsearch.repositories.VerifyNodeRepositoryAction.finishVerification(VerifyNodeRepositoryAction.java:120) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.repositories.VerifyNodeRepositoryAction.access$000(VerifyNodeRepositoryAction.java:49) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.repositories.VerifyNodeRepositoryAction$1.handleException(VerifyNodeRepositoryAction.java:109) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1172) [elasticsearch-7.9.3.jar:7.$
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1172) [elasticsearch-7.9.3.jar:7.$
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:235) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:233) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:225) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:115) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.9.3.jar:7.9.3]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:76) [transport-netty4-client-7.9.3.jar$
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.$
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.$
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Fi$
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.$
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.$
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Fi$
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]

[2021-02-22T13:27:52,200][INFO ][o.e.c.r.a.AllocationService] [node-1] updating number_of_replicas to [0] for indices [.monitoring-kibana-7-2021.02.17, .kiba$
[2021-02-22T13:27:52,202][INFO ][o.e.c.s.MasterService    ] [node-1] node-left[{node-2}{VNjwVdvSTqSzW5Ggxn3k0A}{o7aZPvy4RvW1nkJ6Bn7k_A}{192.168.xx.xxx}{192.1$
[2021-02-22T13:27:52,225][INFO ][o.e.c.s.ClusterApplierService] [node-1] removed {{node-2}{VNjwVdvSTqSzW5Ggxn3k0A}{o7aZPvy4RvW1nkJ6Bn7k_A}{192.168.xx.xxx}{19$
[2021-02-22T13:27:52,236][INFO ][o.e.c.r.DelayedAllocationService] [node-1] scheduling reroute for delayed shards in [59.9s] (4 delayed shards)
[2021-02-22T13:27:52,298][WARN ][o.e.c.r.a.AllocationService] [node-1] [.kibana_task_manager_1][0] marking unavailable shards as stale: [hwd2UE1KQG24rY7Rx569$
[2021-02-22T13:27:52,683][WARN ][o.e.c.r.a.AllocationService] [node-1] [ilm-history-2-000001][0] marking unavailable shards as stale: [5CGBAphQRJyy88-qQXbfoA]
[2021-02-22T13:27:56,350][WARN ][o.e.c.r.a.AllocationService] [node-1] [.monitoring-es-7-2021.02.22][0] marking unavailable shards as stale: [pqCcDYO6Rqa3hx1$
[2021-02-22T13:28:01,338][WARN ][o.e.c.r.a.AllocationService] [node-1] [.monitoring-kibana-7-2021.02.22][0] marking unavailable shards as stale: [8-v4A8mVQ3-$
[2021-02-22T13:28:01,898][WARN ][o.e.c.r.a.AllocationService] [node-1] [metricbeat-7.9.3-2021.02.17-000001][0] marking unavailable shards as stale: [gr_ZA4M1$
[2021-02-22T13:28:17,936][WARN ][o.e.c.r.a.AllocationService] [node-1] [.tasks][0] marking unavailable shards as stale: [YJkFmARiQmKk0S3nVi3uLg]
[2021-02-22T13:28:18,076][WARN ][o.e.c.r.a.AllocationService] [node-1] [.security-7][0] marking unavailable shards as stale: [tk2cqtK5QgakcaoSPQ-cnA]
[2021-02-22T13:28:18,854][INFO ][o.e.n.Node               ] [node-1] stopping ...
[2021-02-22T13:28:18,858][INFO ][o.e.x.w.WatcherService   ] [node-1] stopping watch service, reason [shutdown initiated]
[2021-02-22T13:28:18,859][INFO ][o.e.x.w.WatcherLifeCycleService] [node-1] watcher has stopped and shutdown
[2021-02-22T13:28:18,863][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [node-1] [controller/42418] [Main.cc@154] ML controller exiting
[2021-02-22T13:28:18,865][INFO ][o.e.x.m.p.NativeController] [node-1] Native controller process has stopped - no new native processes can be started
[2021-02-22T13:28:19,683][INFO ][o.e.n.Node               ] [node-1] stopped
[2021-02-22T13:28:19,683][INFO ][o.e.n.Node               ] [node-1] closing ...
[2021-02-22T13:28:19,698][INFO ][o.e.n.Node               ] [node-1] closed

This node was stopped gracefully, and deliberately, by an external influence:

I thought the UID/GID change stopped it. :thinking: Do I have to reinstall elasticsearch on both nodes?

Without understanding what's wrong it's impossible to say what needs to be done to fix it.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.