Filebeat not connecting directly to Elasticsearch from particular machine

Ok, updates ::

  • I have changed the the following properties to the following values ::

idle_timeout: 10s
max_retries: 5
bulk_max_size: 1
flush_interval: 10

After setting those that way, Filebeat was able to transmit logs all day without any timeouts! Yes!!

However, just a few minutes ago, here came the first new problem, log ::

2016-06-16T18:43:21Z INFO Registry file updated. 1532 states written.
2016-06-16T19:00:37Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:55450->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:00:37Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:55450->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:00:37Z INFO send fail
2016-06-16T19:00:37Z INFO backoff retry: 1s
2016-06-16T19:00:38Z INFO Events sent: 1
2016-06-16T19:00:38Z INFO Registry file updated. 1532 states written.
2016-06-16T19:10:57Z INFO Harvester started for file: F:\Logs\ULS\STSP-WFE01-20160616-1910.log
2016-06-16T19:10:57Z INFO Registry file updated. 1533 states written.
2016-06-16T19:17:32Z INFO Read line error: file inactive
2016-06-16T19:22:17Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57623->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:22:17Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57623->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:22:17Z INFO send fail
2016-06-16T19:22:17Z INFO backoff retry: 1s
2016-06-16T19:22:19Z INFO Events sent: 2
2016-06-16T19:22:19Z INFO Registry file updated. 1533 states written.
2016-06-16T19:40:24Z INFO Read line error: file inactive
2016-06-16T19:41:03Z INFO Harvester started for file: F:\Logs\ULS\STSP-WFE01-20160616-1940.log
2016-06-16T19:41:03Z INFO Registry file updated. 1534 states written.
2016-06-16T19:54:47Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57754->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:54:47Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57754->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:54:47Z INFO send fail
2016-06-16T19:54:47Z INFO backoff retry: 1s
2016-06-16T19:54:50Z INFO Events sent: 2
2016-06-16T19:54:50Z INFO Registry file updated. 1534 states written.
2016-06-16T19:56:00Z INFO Events sent: 1
2016-06-16T19:56:00Z INFO Registry file updated. 1534 states written.

Now it is just repeating that. This is on the IaaS web server (SharePoint). The PaaS (cloud service) is still doing fine. Now when I try to access the index coming from that web server, Kibana is throwing a nasty red error at the top saying index not found :frowning:

I tried a few minutes later and I was able to access the index in Kibana again for that web server. It seems to have went through a while of 'hiccups' for some reason....could it be related to those "Read line error: file inactive" messages or?

Based on all the descriptions and behaviour above, I'm quite confident the issue is not directly related to filebeat or Kibana, but to the network connectivity of both to elasticsearch.

As far as I understand, all your request go through the ILB? Did you check the logs of the ILB on why some requests are returned?

This is an ILB configured through PowerShell on Microsoft Azure. Yes, all requests do go through the ILB that contains a backend pool of ES data VM's

Cannot find any logging on that ILB sadly....

Not sure how we should move forward here as I think in this case the ILB logs could be very helpful.

This topic was automatically closed after 21 days. New replies are no longer allowed.