Hi, I have a server that hosts the elasticsearch (call it server A, it's located at 10.150.160.145) and my metricbeat is deployed to many servers on the same network. 9 out of 10 manages to send its metricbeat data to server A with no problem, except this one machine (call it B), which has the exact same metricbeat.yml setting as other servers 9 servers.
Here are the lines I'm seeing in the metricbeat logfile on B:
2018-10-29T15:17:16.598+1100 INFO instance/beat.go:492 Home path: [C:\Program Files\metricbeat] Config path: [C:\Program Files\metricbeat] Data path: [C:\ProgramData\metricbeat] Logs path: [C:\ProgramData\metricbeat\logs]
2018-10-29T15:17:16.604+1100 INFO instance/beat.go:499 Beat UUID: 7e4d0236-3dbb-4b4d-badf-5b46b4052854
2018-10-29T15:17:16.604+1100 INFO [beat] instance/beat.go:716 Beat info {"system_info": {"beat": {"path": {"config": "C:\\Program Files\\metricbeat", "data": "C:\\ProgramData\\metricbeat", "home": "C:\\Program Files\\metricbeat", "logs": "C:\\ProgramData\\metricbeat\\logs"}, "type": "metricbeat", "uuid": "7e4d0236-3dbb-4b4d-badf-5b46b4052854"}}}
2018-10-29T15:17:16.604+1100 INFO [beat] instance/beat.go:725 Build info {"system_info": {"build": {"commit": "45a9a9e1561b6c540e94211ebe03d18abcacae55", "libbeat": "6.3.2", "time": "2018-07-20T04:22:44.000Z", "version": "6.3.2"}}}
2018-10-29T15:17:16.604+1100 INFO [beat] instance/beat.go:728 Go runtime info {"system_info": {"go": {"os":"windows","arch":"amd64","max_procs":2,"version":"go1.9.4"}}}
2018-10-29T15:17:16.616+1100 INFO [beat] instance/beat.go:732 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2018-10-04T00:06:06.62+10:00","hostname":"EMPNRAP07","ips":["172.21.174.101/24","::1/128","127.0.0.1/8","fe80::5efe:ac15:ae65/128","fe80::100:7f:fffe/64"],"kernel_version":"6.1.7601.24231 (win7sp1_ldr.180810-0600)","mac_addresses":["00:50:56:81:0f:7e","00:00:00:00:00:00:00:e0","00:00:00:00:00:00:00:e0"],"os":{"family":"windows","platform":"windows","name":"Windows Server 2008 R2 Standard","version":"6.1","major":1,"minor":0,"patch":0,"build":"7601.24241"},"timezone":"AEDT","timezone_offset_sec":39600,"id":"e5ff36fb-2990-42e0-9bb2-812048505a05"}}}
2018-10-29T15:17:16.617+1100 INFO instance/beat.go:225 Setup Beat: metricbeat; Version: 6.3.2
2018-10-29T15:17:16.617+1100 INFO elasticsearch/client.go:145 Elasticsearch url: http://10.150.160.145:9200
2018-10-29T15:17:16.617+1100 INFO pipeline/module.go:81 Beat name: EMPNRAP07
2018-10-29T15:17:16.618+1100 INFO instance/beat.go:315 metricbeat start running.
2018-10-29T15:17:16.618+1100 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2018-10-29T15:17:16.619+1100 INFO cfgfile/reload.go:122 Config reloader started
2018-10-29T15:17:26.624+1100 INFO helper/privileges_windows.go:62 Metricbeat process and system info: {"OSVersion":{"Major":6,"Minor":1,"Build":7601},"Arch":"amd64","NumCPU":2,"User":{"SID":"S-1-5-18","Account":"SYSTEM","Domain":"NT AUTHORITY","Type":1},"ProcessPrivs":{"SeAssignPrimaryTokenPrivilege":{"enabled":false},"SeAuditPrivilege":{"enabled_by_default":true,"enabled":true},"SeBackupPrivilege":{"enabled":false},"SeChangeNotifyPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateGlobalPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePagefilePrivilege":{"enabled_by_default":true,"enabled":true},"SeCreatePermanentPrivilege":{"enabled_by_default":true,"enabled":true},"SeCreateSymbolicLinkPrivilege":{"enabled_by_default":true,"enabled":true},"SeDebugPrivilege":{"enabled_by_default":true,"enabled":true},"SeImpersonatePrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseBasePriorityPrivilege":{"enabled_by_default":true,"enabled":true},"SeIncreaseQuotaPrivilege":{"enabled":false},"SeIncreaseWorkingSetPrivilege":{"enabled_by_default":true,"enabled":true},"SeLoadDriverPrivilege":{"enabled":false},"SeLockMemoryPrivilege":{"enabled_by_default":true,"enabled":true},"SeManageVolumePrivilege":{"enabled":false},"SeProfileSingleProcessPrivilege":{"enabled_by_default":true,"enabled":true},"SeRestorePrivilege":{"enabled":false},"SeSecurityPrivilege":{"enabled":false},"SeShutdownPrivilege":{"enabled":false},"SeSystemEnvironmentPrivilege":{"enabled":false},"SeSystemProfilePrivilege":{"enabled_by_default":true,"enabled":true},"SeSystemtimePrivilege":{"enabled":false},"SeTakeOwnershipPrivilege":{"enabled":false},"SeTcbPrivilege":{"enabled_by_default":true,"enabled":true},"SeTimeZonePrivilege":{"enabled_by_default":true,"enabled":true},"SeUndockPrivilege":{"enabled":false}}}
2018-10-29T15:17:26.624+1100 INFO helper/privileges_windows.go:70 SeDebugPrivilege is enabled. SeDebugPrivilege=(Default, Enabled)
2018-10-29T15:17:26.625+1100 WARN [cfgwarn] service/service.go:32 BETA: The windows service metricset is beta
2018-10-29T15:17:26.625+1100 INFO cfgfile/reload.go:253 Starting 4 runners ...
2018-10-29T15:17:27.710+1100 WARN transport/tcp.go:36 DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:28.721+1100 ERROR pipeline/output.go:74 Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:28.754+1100 WARN transport/tcp.go:36 DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:30.755+1100 ERROR pipeline/output.go:74 Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:30.787+1100 WARN transport/tcp.go:36 DNS lookup failure "http": lookup http: no such host
2018-10-29T15:17:34.816+1100 ERROR pipeline/output.go:74 Failed to connect: Get http://10.150.160.145:9200: proxyconnect tcp: lookup http: no such host
2018-10-29T15:17:34.817+1100 INFO [publish] pipeline/retry.go:149 retryer: send wait signal to consumer
2018-10-29T15:17:34.817+1100 INFO [publish] pipeline/retry.go:151 done
2018-10-29T15:17:34.850+1100 WARN transport/tcp.go:36 DNS lookup failure "http": lookup http: no such host
It claims to have DNS lookup issue, but the weird thing is, I can ping to that server 10.150.160.145 (i.e. server A) from B, and I can even go to 10.150.160.145:9200 directly on the browser in B, and even get the "You Know, for Search" result!
So why is it that metricbeat on this specific server is claiming its a DNS issue? I'm quite at loss here and could use any tips.
Now, I admit that I am making an assumption that the other 9 servers have the same configuration, so maybe I'm wrong. However, if that's the case, where do I look for configuration to compare? These machines are using WIndows Server 2008 and 2012. I checked the DNS they are using and they seem very similar, so I don't think that's a problem (and I can connect to 10.150.160.145:9200 directly on my browser on machine B after all)