Ok, updates ::
- I have changed the the following properties to the following values ::
idle_timeout: 10s
max_retries: 5
bulk_max_size: 1
flush_interval: 10
After setting those that way, Filebeat was able to transmit logs all day without any timeouts! Yes!!
However, just a few minutes ago, here came the first new problem, log ::
2016-06-16T18:43:21Z INFO Registry file updated. 1532 states written.
2016-06-16T19:00:37Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:55450->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:00:37Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:55450->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:00:37Z INFO send fail
2016-06-16T19:00:37Z INFO backoff retry: 1s
2016-06-16T19:00:38Z INFO Events sent: 1
2016-06-16T19:00:38Z INFO Registry file updated. 1532 states written.
2016-06-16T19:10:57Z INFO Harvester started for file: F:\Logs\ULS\STSP-WFE01-20160616-1910.log
2016-06-16T19:10:57Z INFO Registry file updated. 1533 states written.
2016-06-16T19:17:32Z INFO Read line error: file inactive
2016-06-16T19:22:17Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57623->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:22:17Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57623->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:22:17Z INFO send fail
2016-06-16T19:22:17Z INFO backoff retry: 1s
2016-06-16T19:22:19Z INFO Events sent: 2
2016-06-16T19:22:19Z INFO Registry file updated. 1533 states written.
2016-06-16T19:40:24Z INFO Read line error: file inactive
2016-06-16T19:41:03Z INFO Harvester started for file: F:\Logs\ULS\STSP-WFE01-20160616-1940.log
2016-06-16T19:41:03Z INFO Registry file updated. 1534 states written.
2016-06-16T19:54:47Z ERR Failed to perform any bulk index operations: Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57754->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:54:47Z INFO Error publishing events (retrying): Post http://10.0.1.20:9200/_bulk: read tcp 10.183.220.10:57754->10.0.1.20:9200: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
2016-06-16T19:54:47Z INFO send fail
2016-06-16T19:54:47Z INFO backoff retry: 1s
2016-06-16T19:54:50Z INFO Events sent: 2
2016-06-16T19:54:50Z INFO Registry file updated. 1534 states written.
2016-06-16T19:56:00Z INFO Events sent: 1
2016-06-16T19:56:00Z INFO Registry file updated. 1534 states written.
Now it is just repeating that. This is on the IaaS web server (SharePoint). The PaaS (cloud service) is still doing fine. Now when I try to access the index coming from that web server, Kibana is throwing a nasty red error at the top saying index not found
I tried a few minutes later and I was able to access the index in Kibana again for that web server. It seems to have went through a while of 'hiccups' for some reason....could it be related to those "Read line error: file inactive" messages or?