ElasticSearch deployment is failing in Azure from Marketplace

ElasticSearch deployment is failing in Microsoft Azure Marketplace.

Snippet of the error.
https://pastebin.com/uYpTce3T

Below is the link to files shared by Microsoft Azure Support.

https://support.microsoft.com/en-us/files?workspace=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ3c2lkIjoiNjc2OWU3OWUtZGU2ZS00ZDk2LThhODYtZTBkYzhkMTBhNzY2Iiwic3IiOiIxMTkwMzI2MjEwMDMxODgiLCJhcHBpZCI6ImU2ZWU0M2ViLTBmYmMtNDU0Ni1iYzUyLTRjMTYxZmNkZjRjNCIsInN2IjoidjEiLCJycyI6IkV4dGVybmFsIiwid3RpZCI6ImUzOTRkN2ExLTlhYmEtNDVlNi05YTA0LTVkZmNiMzhlMjNlYSIsImlzcyI6Imh0dHBzOi8vYXBpLmR0bW5lYnVsYS5taWNyb3NvZnQuY29tIiwiYXVkIjoiaHR0cDovL3NtYyIsImV4cCI6MTU2MTQ5ODQzMywibmJmIjoxNTUzNzIyNDMzfQ.BDfJmjuq2IZjAGdl4XATF3TrxpSQjgklM5zoMG5zVhIseIDNzQNhoopnrfsc8mDguioLSODw6iC5nYmD6uoEm2BPidC-bdPuc4W_eASIQWV69CouKkbu-1iV9O0BfcjvEdjh2DGpn5p1nBAXreINib7FpoARPhUrfrKj3aPI9pZkkN9mm9uYo-ZG4F7sM5_taGtFSHceX8rvE4Iqu1vZF8gyt6FAOZOvPlBahMI0M4bTk68umpLKYC3XqOVACu5-cTMzIZFaw3gYiSYHzJZGuv-UZrW-wYPPXrN-J9K2javrjVZBYohjy48pCNX1WVoCckr9faUDgtQLdmcMHbsk7w&wid=6769e79e-de6e-4d96-8a86-e0dc8d10a766

The error message indicates that Elasticsearch was not seen to be running within a given wait period during the deployment. The log files on the VM should provide more details for what the underlying cause is. You can access the Elasticsearch VMs via SSH through either Kibana or a Jumpbox.

I have attached the screenshots of deployment status from Azure.
For log files from "xdex-master-0", please go to the link specified earlier from suuport.microsoft.com

@kyogesh91 the screenshot of failed deployments doesn't provide any more information beyond the information in the pastebin link you've provided.

To get to the root cause, you would need to SSH into one of the failed nodes e.g. start with xdev-master-0 and inspect the log files on the VM. There are several common reasons why a cluster may fail to form, for example,

  1. The vnet to which you're attaching the cluster is using custom DNS servers as opposed to Azure DNS, which are unable to resolve by hostname on the vnet. The template assumes that Azure DNS is used, or that custom DNS servers are configured to be able to resolve by hostname.

  2. A transient error downloading one of the template dependencies such as an apt package or the Elasticsearch RPM package. It doesn't look like the latter in this case as the process would have failed much earlier.

  3. An error related to DNS resolution on the VMs

I don't see a network resource in the screenshot above, which makes me suspect you're attaching to an existing vnet that may be using custom DNS servers. If this is the case, can you check the DNS server logs?

I am sorry, I thought I had shared the files.

Below is the link which contains the log files you asked for

https://10xts-my.sharepoint.com/:f:/p/yogesh/Elwp1U4ILQtEi0GlYN5s39cBOppdEXNLdZ9rGRCzd_-hyw?e=v6lGE7

Please see my previous reply about vnets and custom DNS servers. The error in the elastic-.log file

[2019-03-30T15:48:22,121][ERROR][o.e.b.Bootstrap          ] [xdex-master-1] Exception
java.lang.IllegalArgumentException: No up-and-running site-local (private) addresses found, got [name:lo (lo), name:eth0 (eth0)]
	at org.elasticsearch.common.network.NetworkUtils.getSiteLocalAddresses(NetworkUtils.java:184) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.network.NetworkService.resolveInternal(NetworkService.java:218) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.network.NetworkService.resolveInetAddresses(NetworkService.java:192) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.network.NetworkService.resolveBindHostAddresses(NetworkService.java:108) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.transport.TcpTransport.bindServer(TcpTransport.java:373) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.transport.netty4.Netty4Transport.doStart(Netty4Transport.java:136) ~[?:?]
	at org.elasticsearch.xpack.core.security.transport.netty4.SecurityNetty4Transport.doStart(SecurityNetty4Transport.java:98) ~[?:?]
	at org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4ServerTransport.doStart(SecurityNetty4ServerTransport.java:43) ~[?:?]
	at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:65) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.transport.TransportService.doStart(TransportService.java:229) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:65) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.node.Node.start(Node.java:716) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:269) ~[elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:342) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) [elasticsearch-6.6.1.jar:6.6.1]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) [elasticsearch-6.6.1.jar:6.6.1]

Indicates that you're using custom DNS servers. Per the documentation link in point 1 above, These will need to be configured to perform hostname resolution.

Screenshot%20from%202019-04-01%2020-56-53

This is what the settings have been in Azure

Is that for the vnet to which the VMs are attached?

If you're on a VM on the vnet, can you resolve (e.g. ping) another VM by hostname?


I was able to ping other master nodes and VMS in the VNET

Can you ping them by hostname? The template does not use IP addresses because they are dynamically assigned.

I am able to do it

The first hostname pinged is not part of the cluster, so let's ignore that one for now.

Since you can ping VMs by hostname, I would suggest deleting the failed deployment resources (if they're in their own resource group, you can delete the resource group) and redeploy again.

I have tried your suggestion atleast three times. Same result.
I have uploaded the template that is generated by the UI for reference

https://10xts-my.sharepoint.com/:f:/p/yogesh/Elwp1U4ILQtEi0GlYN5s39cBOppdEXNLdZ9rGRCzd_-hyw?e=v6lGE7

I'm fairly sure the issue is in the existing vnet that the cluster is being connected to. An easy way to see if this is the case is to deploy a vnet with the cluster instead of attaching it to an existing one.

Elasticsearch HTTP layer accepts incoming HTTP requests on port 9200, and the Transport layer accepts TCP connections on port 9300. Can you communicate with other VMs on the network from a VM using these ports?

I tried to use a new Vnet and it worked.
The existing VNET security group settings are attached.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.