ElasticSearch Cluster fails 5 minutes after starting on Azure

Hi,

I am trying to build an ElasticSearch cluster on Azure. I have done it successfully for testing purposes with 3 VM's under the same virtual network. It functioned very good.

Because of my subscription limits, I distrubuted those 3 VM into 3 different subscription. The only difference was that the VM's were not under the same virtual network, it wasn't possible because of the different subscriptions and structure of Azure... I used public ip's for my publish_host settings. It works for 5-6 minutes, I can create indices or do CRUD operations. After a few minutes the child nodes becomes unresponsive, _cluster/health does not respond and I can only reach to the master node which shows me health of the cluster is GREEN, but it is not.

If I try to create an index it fails to create shards, stucks, because of the unresponsive child nodes. I tried many things like different configuration combinations since a week but I could not find a solution. I checked all the logs they does not provide any information even when I set them to Debug or Trace mode. All I get is the unreachable node errors from master node after ca. 10 minutes. I am posting my configuration details:

Operating System: Ubuntu 16.04, default image provided by Canonical, 9200 and 9300 ports are open.
Java Version: oracle-java8
ElasticSearch: 2.3.3 and 2.3.4 both same.

I can also provide my logs if you want but there is no clue as I understand it.

cluster.name: Alpha
node.name: Vulcan
network.host: _eth0_ #tried 0.0.0.0 too. eth0 is a local ip address like 10.0.0.4 or 192.168.0.4 assigned by the nic.
network.publish_host: myes1.westeurope.cloudapp.azure.com # I have to use this parameter when my nodes are not under the same network, but setting this variable creates the problem I explained.
discovery.zen.ping.unicast.hosts: ["myes1.westeurope.com","myes2.westeurope.cloudapp.azure.com","myes3.westeurope.cloudapp.azure.com"]
discovery.zen.minimum_master_nodes: 2
#Paths
path:
  logs: /var/log/elasticsearch
  data: /var/data/elasticsearch

For all intends and purposes you are creating a cluster that spans multiple datacenters, this is not a configuration we support:

Even if all of the VM's are on the same region they will have to reach out to eachother over the internet. The network link for internal communication is vital to elasticsearch's stability.

Have a look at our azure marketplace offering:

https://azure.microsoft.com/en-us/marketplace/partners/elastic/elasticsearchelasticsearch/

It allows you to deploy Elasticsearch in all sorts of desired topologies. A 3 node elasticsearch cluster where the nodes are both master eligable and hold data is possible. You can also opt out of the Kibana VM if you do not need it.

Hi,
Thanks for your reply. I read that blog entry and it helped me to understand the situation finally. I spent a lot of time on this issue. But the note I read confused me a bit:
(Note: On AWS, running a cluster across availability zones within a single region is supported as Amazon provides consistent high bandwidth and low latency.)
I am using Azure VM's in the same region/location, which is West Europe for my cluster. What is the reason behind of this? I also considered using Virtual Network Gateway (included in Azure) and Docker Overlay Network, but they are expensive. Is there any other possible solution except using same subscription? I need to use different subscriptions because of my monthly credit quota.