Hi Micheal,
Thanks for detailed info.
In Mycase Elasticsearch data node pods are keep restarting, from the logs I figured out the below error.
Error
org.elasticsearch.transport.ConnectTransportException: [node-3][10.244.8.2:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1309) ~[elasticsearch-6.8.14.jar:6.8.14]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:101) ~[elasticsearch-6.8.14.jar:6.8.14]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-6.8.14.jar:6.8.14]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-6.8.14.jar:6.8.14]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 10.244.8.2/10.244.8.2:9300
at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
Not sure is what is causing this? I have ran below api call to see the cpu,ram specifications ?
https://es:9200/_cat/nodesv&h=ip,heap.current,heap.percent,heap.max,ram.current,ram.percent,ram.max
output below:
looks like RAM has been completely utilized
Total RAM - 16GB
HEAPSIZE - 14GB
Is More heapsize & less non-heapsize causing data nodes pods to restart?