Metricbeat Unable to Connect to ElasticSearch, DNS Lookup Failed?


(David) #1

Hi, I have a server that hosts the elasticsearch (call it server A, it's located at 10.150.160.145) and my metricbeat is deployed to many servers on the same network. 9 out of 10 manages to send its metricbeat data to server A with no problem, except this one machine (call it B), which has the exact same metricbeat.yml setting as other servers 9 servers.

Here are the lines I'm seeing in the metricbeat logfile on B:

2018-10-29T15:17:16.598+1100	INFO	instance/beat.go:492	Home path: [C:\Program Files\metricbeat] Config path: [C:\Program Files\metricbeat] Data path: [C:\ProgramData\metricbeat] Logs path: [C:\ProgramData\metricbeat\logs]
2018-10-29T15:17:16.604+1100	INFO	instance/beat.go:499	Beat UUID: 7e4d0236-3dbb-4b4d-badf-5b46b4052854
2018-10-29T15:17:16.604+1100	INFO	[beat]	instance/beat.go:716	Beat info	{"system_info": {"beat": {"path": {"config": "C:\\Program Files\\metricbeat", "data": "C:\\ProgramData\\metricbeat", "home": "C:\\Program Files\\metricbeat", "logs": "C:\\ProgramData\\metricbeat\\logs"}, "type": "metricbeat", "uuid": "7e4d0236-3dbb-4b4d-badf-5b46b4052854"}}}
2018-10-29T15:17:16.604+1100	INFO	[beat]	instance/beat.go:725	Build info	{"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:22:44.000Z", "version": "6.3.2"}}}
2018-10-29T15:17:16.604+1100	INFO	[beat]	instance/beat.go:728	Go runtime info	{"system_info": {"go": {"os":"windows","arch":"amd64","max_procs":2,"version":"go1.9.4"}}}
2018-10-29T15:17:16.616+1100	INFO	[beat]	instance/beat.go:732	Host info	{"system_info": {"host": {"architecture":"x86_64","boot_time":"2018-10-04T00:06:06.62+10:00","hostname":"EMPNRAP07","ips":["172.21.174.101/24","::1/128","127.0.0.1/8","fe80::5efe:ac15:ae65/128","fe80::100:7f:fffe/64"],"kernel_version":"6.1.7601.24231 (win7sp1_ldr.180810-0600)","mac_addresses":["00:50:56:81:0f:7e","00:00:00:00:00:00:00:e0","00:00:00:00:00:00:00:e0"],"os":{"family":"windows","platform":"windows","name":"Windows Server 2008 R2 Standard","version":"6.1","major":1,"minor":0,"patch":0,"build":"7601.24241"},"timezone":"AEDT","timezone_offset_sec":39600,"id":"e5ff36fb-2990-42e0-9bb2-812048505a05"}}}
2018-10-29T15:17:16.617+1100	INFO	instance/beat.go:225	Setup Beat: metricbeat; Version: 6.3.2
2018-10-29T15:17:16.617+1100	INFO	elasticsearch/client.go:145	Elasticsearch url: http://10.150.160.145:9200
2018-10-29T15:17:16.617+1100	INFO	pipeline/module.go:81	Beat name: EMPNRAP07
2018-10-29T15:17:16.618+1100	INFO	instance/beat.go:315	metricbeat start running.
2018-10-29T15:17:16.618+1100	INFO	[monitoring]	log/log.go:97	Starting metrics logging every 30s
2018-10-29T15:17:16.619+1100	INFO	cfgfile/reload.go:122	Config reloader started
2018-10-29T15:17:26.624+1100	INFO	helper/privileges_windows.go:62	Metricbeat process and system info: {"OSVersion":{"Major":6,"Minor":1,"Build":7601},"Arch":"amd64","NumCPU":2,"User":{"SID":"S-1-5-18","Account":"SYSTEM","Domain":"NT AUTHORITY","Type":1},"ProcessPrivs":{"SeAssignPrimaryTokenPrivilege":{"enabled":false},"SeAuditPrivilege":{"enabled_by_default":true,"enabled":true},"SeBackupPrivilege":{"enabled":false},"SeChangeNotifyPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateGlobalPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePagefilePrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePermanentPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateSymbolicLinkPrivilege":{"enabled_by_default":true,"enabled":true},"SeDebugPrivilege":{"enabled_by_default":true,"enabled":true},"SeImpersonatePrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseBasePriorityPrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseQuotaPrivilege":{"enabled":false},"SeIncreaseWorkingSetPrivilege":{"enabled_by_default":true,"enabled":true},"SeLoadDriverPrivilege":{"enabled":false},"SeLockMemoryPrivilege":{"enabled_by_default":true,"enabled":true},"SeManageVolumePrivilege":{"enabled":false},"SeProfileSingleProcessPrivilege":{"enabled_by_default":true,"enabled":true},"SeRestorePrivilege":{"enabled":false},"SeSecurityPrivilege":{"enabled":false},"SeShutdownPrivilege":{"enabled":false},"SeSystemEnvironmentPrivilege":{"enabled":false},"SeSystemProfilePrivilege":{"enabled_by_default":true,"enabled":true},"SeSystemtimePrivilege":{"enabled":false},"SeTakeOwnershipPrivilege":{"enabled":false},"SeTcbPrivilege":{"enabled_by_default":true,"enabled":true},"SeTimeZonePrivilege":{"enabled_by_default":true,"enabled":true},"SeUndockPrivilege":{"enabled":false}}}
2018-10-29T15:17:26.624+1100	INFO	helper/privileges_windows.go:70	SeDebugPrivilege is enabled. SeDebugPrivilege=(Default, Enabled)
2018-10-29T15:17:26.625+1100	WARN	[cfgwarn]	service/service.go:32	BETA: The windows service metricset is beta
2018-10-29T15:17:26.625+1100	INFO	cfgfile/reload.go:253	Starting 4 runners ...
2018-10-29T15:17:27.710+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:28.721+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:28.754+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:30.755+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:30.787+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:34.816+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:34.817+1100	INFO	[publish]	pipeline/retry.go:149	retryer: send wait signal to consumer
2018-10-29T15:17:34.817+1100	INFO	[publish]	pipeline/retry.go:151	  done
2018-10-29T15:17:34.850+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host

It claims to have DNS lookup issue, but the weird thing is, I can ping to that server 10.150.160.145 (i.e. server A) from B, and I can even go to 10.150.160.145:9200 directly on the browser in B, and even get the "You Know, for Search" result!

So why is it that metricbeat on this specific server is claiming its a DNS issue? I'm quite at loss here and could use any tips.

Now, I admit that I am making an assumption that the other 9 servers have the same configuration, so maybe I'm wrong. However, if that's the case, where do I look for configuration to compare? These machines are using WIndows Server 2008 and 2012. I checked the DNS they are using and they seem very similar, so I don't think that's a problem (and I can connect to 10.150.160.145:9200 directly on my browser on machine B after all)


(Andrew Cholakian) #2

Can you share your exact config? The messages to me look like it's trying to look up a host named http. A likely culprit here would be that somehow the whole URL isn't being configured and just the http prefix is being used.


(David) #3

Hi Andrew, I'm assuming you are talking about metricbeat.yml. In that case:

###################### Metricbeat Configuration Example #######################

# This file is an example configuration file highlighting only the most common
# options. The metricbeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/metricbeat/index.html

#==========================  Modules configuration ============================

metricbeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: true

  # Period on which files under path should be checked for changes
  reload.period: 10s

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

#============================= Elastic Cloud ==================================

# These settings simplify using metricbeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["10.150.160.145:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

#============================== Xpack Monitoring ===============================
# metricbeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#xpack.monitoring.enabled: false

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well. Any setting that is not set is
# automatically inherited from the Elasticsearch output configuration, so if you
# have the Elasticsearch output configured, you can simply uncomment the
# following line.
#xpack.monitoring.elasticsearch:

(Andrew Cholakian) #4

Can you try dumping the config with:

.\metricbeat.exe export config

and pasting that here (with anything sensitive redacted). That will make sure we aren't missing anything.


(David) #5

Hi Andrew, here is the result of the export config on the server with problematic metricbeat

metricbeat:
  config:
    modules:
      path: C:\Program Files\metricbeat/modules.d/*.yml
      reload:
        enabled: true
        period: 10s
output:
  elasticsearch:
    hosts:
    - 10.150.160.145:9200
path:
  config: C:\Program Files\metricbeat
  data: C:\Program Files\metricbeat\data
  home: C:\Program Files\metricbeat
  logs: C:\Program Files\metricbeat\logs
setup:
  kibana: null
  template:
    settings:
      index:
        codec: best_compression
        number_of_shards: 1

And this is the config export on the server with working metricbeat and it looks similar

metricbeat:
  config:
    modules:
      path: C:\Program Files\metricbeat/modules.d/*.yml
      reload:
        enabled: true
        period: 10s
output:
  elasticsearch:
    hosts:
    - lgnpvdev5314:9200
path:
  config: C:\Program Files\metricbeat
  data: C:\Program Files\metricbeat\data
  home: C:\Program Files\metricbeat
  logs: C:\Program Files\metricbeat\logs
setup:
  kibana: null
  template:
    settings:
      index:
        codec: best_compression
        number_of_shards: 1

I am aware that there is a difference in the host field, but 10.150.160.145 is the IP address for "lgnpvdev5314". Initially, the host was set up to lgnpvdev5314, but due to the DNS error issue, I thought I should use direct IP address instead.


(David) #6

If it helps, just now I changed the host back from 10.150.160.145 to lgnpvdev5314 again on the server with problematic metricbeat, and I'm still seeing the error message, as you can see below:

2018-10-31T09:59:39.789+1100	INFO	instance/beat.go:492	Home path: [C:\Program Files\metricbeat] Config path: [C:\Program Files\metricbeat] Data path: [C:\ProgramData\metricbeat] Logs path: [C:\ProgramData\metricbeat\logs]
2018-10-31T09:59:39.792+1100	INFO	instance/beat.go:499	Beat UUID: 7e4d0236-3dbb-4b4d-badf-5b46b4052854
2018-10-31T09:59:39.793+1100	INFO	[beat]	instance/beat.go:716	Beat info	{"system_info": {"beat": {"path": {"config": "C:\\Program Files\\metricbeat", "data": "C:\\ProgramData\\metricbeat", "home": "C:\\Program Files\\metricbeat", "logs": "C:\\ProgramData\\metricbeat\\logs"}, "type": "metricbeat", "uuid": "7e4d0236-3dbb-4b4d-badf-5b46b4052854"}}}
2018-10-31T09:59:39.793+1100	INFO	[beat]	instance/beat.go:725	Build info	{"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:22:44.000Z", "version": "6.3.2"}}}
2018-10-31T09:59:39.793+1100	INFO	[beat]	instance/beat.go:728	Go runtime info	{"system_info": {"go": {"os":"windows","arch":"amd64","max_procs":2,"version":"go1.9.4"}}}
2018-10-31T09:59:39.818+1100	INFO	[beat]	instance/beat.go:732	Host info	{"system_info": {"host": {"architecture":"x86_64","boot_time":"2018-10-31T00:03:56.14+11:00","hostname":"EMPNRAP07","ips":["172.21.174.101/24","::1/128","127.0.0.1/8","fe80::5efe:ac15:ae65/128","fe80::100:7f:fffe/64"],"kernel_version":"6.1.7601.24260 (win7sp1_ldr.180908-0600)","mac_addresses":["00:50:56:81:0f:7e","00:00:00:00:00:00:00:e0","00:00:00:00:00:00:00:e0"],"os":{"family":"windows","platform":"windows","name":"Windows Server 2008 R2 Standard","version":"6.1","major":1,"minor":0,"patch":0,"build":"7601.24263"},"timezone":"AEDT","timezone_offset_sec":39600,"id":"e5ff36fb-2990-42e0-9bb2-812048505a05"}}}
2018-10-31T09:59:39.818+1100	INFO	instance/beat.go:225	Setup Beat: metricbeat; Version: 6.3.2
2018-10-31T09:59:39.819+1100	INFO	elasticsearch/client.go:145	Elasticsearch url: http://lgnpvdev5314:9200
2018-10-31T09:59:39.819+1100	INFO	pipeline/module.go:81	Beat name: EMPNRAP07
2018-10-31T09:59:39.819+1100	INFO	instance/beat.go:315	metricbeat start running.
2018-10-31T09:59:39.819+1100	INFO	[monitoring]	log/log.go:97	Starting metrics logging every 30s
2018-10-31T09:59:39.820+1100	INFO	cfgfile/reload.go:122	Config reloader started
2018-10-31T09:59:49.826+1100	INFO	helper/privileges_windows.go:62	Metricbeat process and system info: {"OSVersion":{"Major":6,"Minor":1,"Build":7601},"Arch":"amd64","NumCPU":2,"User":{"SID":"S-1-5-18","Account":"SYSTEM","Domain":"NT AUTHORITY","Type":1},"ProcessPrivs":{"SeAssignPrimaryTokenPrivilege":{"enabled":false},"SeAuditPrivilege":{"enabled_by_default":true,"enabled":true},"SeBackupPrivilege":{"enabled":false},"SeChangeNotifyPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateGlobalPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePagefilePrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePermanentPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateSymbolicLinkPrivilege":{"enabled_by_default":true,"enabled":true},"SeDebugPrivilege":{"enabled_by_default":true,"enabled":true},"SeImpersonatePrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseBasePriorityPrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseQuotaPrivilege":{"enabled":false},"SeIncreaseWorkingSetPrivilege":{"enabled_by_default":true,"enabled":true},"SeLoadDriverPrivilege":{"enabled":false},"SeLockMemoryPrivilege":{"enabled_by_default":true,"enabled":true},"SeManageVolumePrivilege":{"enabled":false},"SeProfileSingleProcessPrivilege":{"enabled_by_default":true,"enabled":true},"SeRestorePrivilege":{"enabled":false},"SeSecurityPrivilege":{"enabled":false},"SeShutdownPrivilege":{"enabled":false},"SeSystemEnvironmentPrivilege":{"enabled":false},"SeSystemProfilePrivilege":{"enabled_by_default":true,"enabled":true},"SeSystemtimePrivilege":{"enabled":false},"SeTakeOwnershipPrivilege":{"enabled":false},"SeTcbPrivilege":{"enabled_by_default":true,"enabled":true},"SeTimeZonePrivilege":{"enabled_by_default":true,"enabled":true},"SeUndockPrivilege":{"enabled":false}}}
2018-10-31T09:59:49.826+1100	INFO	helper/privileges_windows.go:70	SeDebugPrivilege is enabled. SeDebugPrivilege=(Default, Enabled)
2018-10-31T09:59:49.828+1100	WARN	[cfgwarn]	service/service.go:32	BETA: The windows service metricset is beta
2018-10-31T09:59:49.828+1100	INFO	cfgfile/reload.go:253	Starting 4 runners ...
2018-10-31T09:59:50.941+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-31T09:59:51.943+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://lgnpvdev5314:9200: proxyconnect tcp: lookup http: no such host
2018-10-31T09:59:51.991+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-31T09:59:53.992+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://lgnpvdev5314:9200: proxyconnect tcp: lookup http: no such host
2018-10-31T09:59:54.025+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-31T09:59:58.217+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://lgnpvdev5314:9200: proxyconnect tcp: lookup http: no such host
2018-10-31T09:59:58.235+1100	INFO	[publish]	pipeline/retry.go:149	retryer: send wait signal to consumer
2018-10-31T09:59:58.235+1100	INFO	[publish]	pipeline/retry.go:151	  done
2018-10-31T09:59:58.410+1100	WARN	transport/tcp.go:36	DNS lookup failure "http": lookup http: no such host
2018-10-31T10:00:03.133+1100	ERROR	process/process.go:454	Error getting process details. pid=4960: error getting process arguments for pid=4960: ProcArgs failed for pid=4960: could not get Win32_Process WHERE ProcessId = 4960: wmi: cannot load field "CommandLine" into a "string": unsupported type (<nil>)
2018-10-31T10:00:04.960+1100	ERROR	process/process.go:454	Error getting process details. pid=6252: error getting process arguments for pid=6252: ProcArgs failed for pid=6252: could not get Win32_Process WHERE ProcessId = 6252: wmi: cannot load field "CommandLine" into a "string": unsupported type (<nil>)
2018-10-31T10:00:05.522+1100	ERROR	process/process.go:454	Error getting process details. pid=6568: error getting process arguments for pid=6568: ProcArgs failed for pid=6568: could not get Win32_Process WHERE ProcessId = 6568: wmi: cannot load field "CommandLine" into a "string": unsupported type (<nil>)
2018-10-31T10:00:06.411+1100	ERROR	pipeline/output.go:74	Failed to connect: Get http://lgnpvdev5314:9200: proxyconnect tcp: lookup http: no such host
2018-10-31T10:00:06.416+1100	INFO	[publish]	pipeline/retry.go:172	retryer: send unwait-signal to consumer

(Andrew Cholakian) #7

It looks like the issue is with a configured HTTP proxy. Have you configured windows to use one on that box?


(David) #8

I'm assuming you want me to compare the proxy setting being setup on the problematic server and compare it to a working one, correct?

This is what I did to check, let me know if I did something wrong here:
I ran cmd as administrator and executed "netsh winhttp show proxy" command
The result is the same for both problematic server and the working one, which is "Direct access (no proxy server)"

Forgot to mention, on both servers, I installed Metricbeat as service and the service is ran by "Local System"


(David) #9

Hello? Did I do something wrong?


(David) #10

Another information I forgot to mention is that my elasticsearch instance I mentioned in the first post is configured to accept direct metricbeat data from about 20 servers, I'm not sure if that could potentially cause conflict and if it needs to go through logstash first instead.

I have another elasticsearch instance (different server, however, it has similar settings) where it receives filebeat data from these same problematics servers and they worked just fine. However, they are configured to point at logstash instead of directly at elasticsearch


(David) #11

I finally found the issue. It turns out that metricbeat uses proxy defined in environment variable HTTP_PROXY, HTTPS_PROXY and NO_PROXY. It turns out that there are some proxy defined there that are not working anymore and I actually had to remove both variable value to fix this issue (since we don't need proxy to get to the machine)