Elastic nodes started to give hardware error on esxi 8.01c servers

After upgrading ESXi servers from 7.0.3l to 8.0.1c some of the elastic clusters started to give hardware errors during index hash. If move the problematic elastics VMs to the old version of the esxi servers (7.0.3l version) again, the problem is fixed. What do we need to do on vm or elastic app to understand the root cause?

Hello and welcome,

It is not clear if this is an issue with Elasticsearch or with your infrastructure.

Is the Elasticsearch service running on those VMs giving some error, or the error is in the VM itself?

The VMs are running on the ESXi cluster as a virtual machine. Elastic services are running on those VMs. The hardware error logs came across in elastic logs.
Actually I am asking to understand it was related to VMware ESXi or elastic.

Which Logs? Your system logs or Elastic Service logs? You need to share it.

From what you shared this seems to be a ESXi issue, not an Elastic issue.

Elastic server log like below

Yeah, those Elasticsearch errors are normally related to underlying hardware issues, you may need to investigate on the ESXi side, but I'm not sure how.

Yeah this is pretty much certain to be infrastructural, see Troubleshooting corruption | Elasticsearch Guide [8.10] | Elastic for more info.

Hello @DavidTurner and @leandrojmp

With the following article, can we say "it should be related to Elasticsearch"?
In our example, only one thing changed that was esxi version. The guest OS version is the same and elastic VMs are run on the same bare metal servers.

No, the bug in the article you linked was entirely in the Linux kernel. Its only relationship with Elasticsearch was that we discovered it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.