Metricbeat on VM can't connect to Elasticsearch in another VM

Hi all.

I am new to Elastic and I have been reading many of the possible solutions to the problem I am experiencing without any success. May be I could help some more direct help here.

Scenario:

  • Lenovo machine running Ubuntu 18.04.3 and VirtualBox 6.0 as a host.
  • 2 VMs running Ubuntu 18.04.3 with bridged adapter network connections each (they get their IPs from the WIFI Router). Both are running on the same host.
  • VM1 (Elasticsearch running, Metricbeat running and getting data OK) IP=10.0.0.19
  • VM2 (Metricbeat running) IP=10.0.0.106

They ping each other OK.

From Metricbeat VM2 I keep getting...

VM2~$ sudo metricbeat test output
elasticsearch: http://10.0.0.19:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.0.0.19
dial up... ERROR dial tcp 10.0.0.19:9200: connect: connection refused.

VM2~$ sudo metricbeat -e
2019-08-30T11:37:53.747-0700 ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(http://10.0.0.19:9200)): Get http://10.0.0.19:9200: dial tcp 10.0.0.19:9200: connect: connection refused
2019-08-30T11:37:53.747-0700 INFO pipeline/output.go:93 Attempting to reconnect to backoff(elasticsearch(http://10.0.0.19:9200)) with 54 reconnect attempt(s)
2019-08-30T11:37:53.748-0700 INFO [publisher] pipeline/retry.go:189 retryer: send unwait-signal to consumer
2019-08-30T11:37:53.748-0700 INFO [publisher] pipeline/retry.go:191 done
2019-08-30T11:37:53.748-0700 INFO [publisher] pipeline/retry.go:166 retryer: send wait signal to consumer
2019-08-30T11:37:53.748-0700 INFO [publisher] pipeline/retry.go:168 done

Configurations:

VM1 configuration for elasticseach.yml file:

network.host: ["localhost"]
http.port: 9200

VM2 configuration for metricbeat.yml file:

output.elasticsearch:
hosts: ["10.0.0.19:9200"]

Configuration seems to be pretty straight forward so I would assume it is a network or FW issue.

VM1~$ sudo ufw status
Status: active
To Action From
Anywhere ALLOW 10.0.0.0/24

VM2~$ sudo ufw status
Status: inactive

VM1~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 100 0 0 enp0s3
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3
link-local 0.0.0.0 255.255.0.0 U 1000 0 0 enp0s3

VM2~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default _gateway 0.0.0.0 UG 100 0 0 enp0s3
10.0.0.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3
link-local 0.0.0.0 255.255.0.0 U 1000 0 0 enp0s3

VM1~$ curl 10.0.0.19:9200
curl: (7) Failed to connect to 10.0.0.19 port 9200: Connection refused

VM1~$ curl 0.0.0.0:9200
{
"name" : "VirtualBox",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "DsgGfrrHTmeptq0aQeanDQ",
"version" : {
"number" : "7.3.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "4749ba6",
"build_date" : "2019-08-19T20:19:25.651794Z",
"build_snapshot" : false,
"lucene_version" : "8.1.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

VM1~$ curl localhost:9200
{
"name" : "VirtualBox",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "DsgGfrrHTmeptq0aQeanDQ",
"version" : {
"number" : "7.3.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "4749ba6",
"build_date" : "2019-08-19T20:19:25.651794Z",
"build_snapshot" : false,
"lucene_version" : "8.1.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

VM2~$ curl 10.0.0.19:9200
curl: (7) Failed to connect to 10.0.0.19 port 9200: Connection refused

I have read plenty of answers and docs and tried different things by changing parameters and configurations and still the same results. I have also tried installing and running Metricbeat from a different machine in the same subnet and same results. Is there any other configuration I am missing?

Any help or hint is much appreciated.

Hi @ajaque and welcome :slight_smile:

As Elasticsearch is not reachable on this IP, even from the same machine, I'd say that the problem is that Elasticsearch is not listening on this IP. It only listens on local interfaces by default.
You would need to set network.host to fix this.

1 Like

Thanks Jaime for your response...

This is what I just did...

  1. Before any changes I had the port listening...

VM1~$ sudo netstat -an | grep -E "9200"
tcp 0 0 127.0.0.1:37138 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36860 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36856 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36846 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:37124 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36848 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36868 127.0.0.1:9200 ESTABLISHED
tcp 0 0 127.0.0.1:36858 127.0.0.1:9200 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 :::* LISTEN
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36848 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36860 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:37138 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36856 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36858 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:37124 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36846 ESTABLISHED
tcp6 0 0 127.0.0.1:9200 127.0.0.1:36868 ESTABLISHED

  1. So I changed elasticsearch.yml file on VM1 to...

network.host: ["0.0.0.0"]

  1. After every change I restarted and queried the elasticsearch service. It failed to run...

VM1~$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-08-30 13:03:51 PDT; 1s ago
Docs: http://www.elastic.co
Process: 20023 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pi
Main PID: 20023 (code=exited, status=78)

Aug 30 13:02:48 VirtualBox systemd[1]: Started Elasticsearch.
Aug 30 13:02:51 VirtualBox elasticsearch[20023]: OpenJDK 64-Bit Server VM warning: Option UseConcMar
Aug 30 13:03:51 VirtualBox systemd[1]: elasticsearch.service: Main process exited, code=exited, stat
Aug 30 13:03:51 VirtualBox systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

  1. Then I changed the file to...

network.host: ["10.0.0.19"]

  1. I restarted and queried the elasticsearch service again. It failed to run...

VM1~$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2019-08-30 13:08:02 PDT; 30s ago
Docs: http://www.elastic.co
Process: 21528 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=78)
Main PID: 21528 (code=exited, status=78)

Aug 30 13:07:20 VirtualBox systemd[1]: Started Elasticsearch.
Aug 30 13:07:22 VirtualBox elasticsearch[21528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely
Aug 30 13:08:02 VirtualBox systemd[1]: elasticsearch.service: Main process exited, code=exited, status=78/n/a
Aug 30 13:08:02 VirtualBox systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

And in both cases I keep getting the same error from VM2

VM2~$ sudo metricbeat test output
elasticsearch: http://10.0.0.19:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.0.0.19
dial up... ERROR dial tcp 10.0.0.19:9200: connect: connection refused

I have also checked configurations in the WIFI Router and no FW or ACLs are being used.

I did run a tcpdump on the VM1 (elasticsearch) and I get this every time I do the test output from the VM2. Not sure if it does give any relevant info besides that the TCP packets reach the VM1...

VM1~$ sudo tcpdump -vvv | grep "10.0.0.106"
tcpdump: listening on enp0s3, link-type EN10MB (Ethernet), capture size 262144 bytes
10.0.0.106.57672 > VirtualBox.9200: Flags [S], cksum 0xe538 (correct), seq 1190161585, win 64240, options [mss 1460,sackOK,TS val 2313883796 ecr 0,nop,wscale 7], length 0
VirtualBox.9200 > 10.0.0.106.57672: Flags [R.], cksum 0xe278 (correct), seq 0, ack 1190161586, win 0, length 0

Umm, I am not sure if network.host accepts lists of IPs, did you try with this?

network.host: '0.0.0.0'

I tried that too... same results.

Weirdly enough, I just tried this combination of parameters and it seems to be working now...

---------------------------------- Network -----------------------------------

Set the bind address to a specific IP (IPv4 or IPv6):

#network.host: ["localhost"]
#network.host: 0.0.0.0
#network.host: "0.0.0.0"
#network.host: '0.0.0.0'
#network.host: 10.0.0.19
network.bind_host: 0.0.0.0
node.master: true
node.data: true
transport.host: localhost
transport.tcp.port: 9300

Set a custom port for HTTP:

#http.port: 9200

Set a transport port for communication between nodes

#transport.port: 9200

For more information, consult the network module documentation.

I don't know if this is an optional or elegant solution to my issue but it seems to work.

I will add new VMs and see if they connect without issues. I guess it should be OK now as long as I keep the same configuration in all of them.

Thank you very much Jaime.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.