Filebeat 5.0 - stuck in "ERR Problem putting initial state: prospector outlet closed"


#1

Hi,

First of all congratz on 5.0 release!

I've encountered a problem after upgrading from 5-0-rc1 to 5.0 with the same config that I used for 5.0-rc1.

Filebeat does nothing now and only sends "ERR Problem putting initial state: prospector outlet closed" into logs without shipping any events.

Is this anyhow expected? Do you want me to try nuking my registry files?


(ruflin) #2

Hm, that is strange. This would kind of mean your spooler was stopped. Any chance to start filebeat in debug mode and share the previous debug lines? Please keep a copy of the current registry file in case we need it later to debug an issue.


#3

After downgrading to rc1 everything went back to normal.
I'll spin another update.


(ruflin) #4

Great. Please try to enable the debug log.


#5

Ok, two things.

  1. Despite running "service filebeat stop" the filebeat process (but not filebeat-god) is still running several minutes after -- I believe the section after [...] was stored after service stop attempt.

  2. Debug logging:

2016-10-26T21:42:10Z DBG  Disable stderr logging
2016-10-26T21:42:10Z INFO Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2016-10-26T21:42:10Z INFO Setup Beat: filebeat; Version: 5.0.0
2016-10-26T21:42:10Z DBG  Processors: 
2016-10-26T21:42:10Z INFO Metrics logging every 30s
2016-10-26T21:42:10Z INFO Max Retries set to: 3
2016-10-26T21:42:10Z DBG  configure maxattempts: 4
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  connect
ancer: start client loop
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  load balancer: start client loop
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z DBG  connect
2016-10-26T21:42:10Z INFO Activated logstash as output plugin.
2016-10-26T21:42:10Z DBG  Create output worker
2016-10-26T21:42:10Z DBG  No output is defined to store the topology. The server fields might not be filled.
2016-10-26T21:42:10Z INFO Publisher name: hostname7
2016-10-26T21:42:10Z INFO Flush Interval set to: 1s
2016-10-26T21:42:10Z INFO Max Bulk Size set to: 4096
2016-10-26T21:42:10Z DBG  create bulk processing worker (interval=1s, bulk size=4096)
2016-10-26T21:42:10Z INFO filebeat start running.
2016-10-26T21:42:10Z INFO Registry file set to: /var/lib/filebeat/registry
2016-10-26T21:42:10Z INFO Loading registrar data from /var/lib/filebeat/registry
2016-10-26T21:42:11Z INFO States Loaded from registrar: 4065
2016-10-26T21:42:11Z INFO Loading Prospectors: 4
2016-10-26T21:42:11Z INFO Starting Registrar
2016-10-26T21:42:11Z INFO Start sending events to output
2016-10-26T21:42:11Z INFO Starting spooler: spool_size: 16384; idle_timeout: 3s
2016-10-26T21:42:11Z DBG  exclude_files: [^.nfs]
2016-10-26T21:42:11Z INFO Load previous states from registry into memory
2016-10-26T21:42:14Z DBG  Flushing spooler because of timeout. Events flushed: 9
2016-10-26T21:42:14Z DBG  No events to publish
2016-10-26T21:42:17Z DBG  Flushing spooler because of timeout. Events flushed: 6
2016-10-26T21:42:17Z DBG  No events to publish
2016-10-26T21:42:20Z DBG  Flushing spooler because of timeout. Events flushed: 14
2016-10-26T21:42:20Z DBG  No events to publish
2016-10-26T21:42:23Z DBG  Flushing spooler because of timeout. Events flushed: 3
2016-10-26T21:42:23Z DBG  No events to publish
2016-10-26T21:42:26Z DBG  Flushing spooler because of timeout. Events flushed: 3
2016-10-26T21:42:26Z DBG  No events to publish
2016-10-26T21:42:29Z DBG  Flushing spooler because of timeout. Events flushed: 3
2016-10-26T21:42:29Z DBG  No events to publish
2016-10-26T21:42:32Z DBG  Flushing spooler because of timeout. Events flushed: 4
2016-10-26T21:42:32Z DBG  No events to publish
2016-10-26T21:42:35Z DBG  Flushing spooler because of timeout. Events flushed: 4
2016-10-26T21:42:35Z DBG  No events to publish
[...]
2016-10-26T21:45:18Z INFO Prospector outlet closed
2016-10-26T21:45:18Z ERR Problem putting initial state: prospector outlet closed
2016-10-26T21:45:18Z INFO Prospector outlet closed
2016-10-26T21:45:18Z ERR Problem putting initial state: prospector outlet closed
2016-10-26T21:45:20Z DBG  Flushing spooler because of timeout. Events flushed: 0
2016-10-26T21:45:20Z INFO Prospector outlet closed
2016-10-26T21:45:20Z ERR Problem putting initial state: prospector outlet closed
2016-10-26T21:45:22Z INFO Prospector outlet closed
2016-10-26T21:45:22Z ERR Problem putting initial state: prospector outlet closed

Not sure what those "Events flushed" means as nothing is finding its way to Elasticsearch.

Again, after downgrade to rc1 everything works just fine. This was tested on different node, but with the same configuration.


(ruflin) #6

Good to know that the parts after [...] only appear after trying to stop filebeat. That explains this line.

It seems that filebeat hangs / does not complete the first loading of the states. This is indeed something we changed between rc1 and GA. Based on the error messages above I assume it happens when the new state is sent to the spooler. Somehow the spooler does not confirm the event. Perhaps there is an issue which is caused by a large number of files or states. I would assume cleaning the registry would solve it as then no states have to be loaded.

Any chance you could either post or send me your current registry and config file?

The flushing just happens frequently based on the timeout and can be ignored.


#7

Yes, it's starting without any problem with an empty registry file.


(ruflin) #8

I think we have here 2 bugs. First one is that it doesn't properly shut down if something goes wrong. Second one is that we need some error reporting on what is going wrong. I currently suspect it happens in OnEvent and is blocking there but need to find a way to reproduce it myself to debug it.

Any chance to share your registry file? It seems you have 4065 entries inside. Will try to load registry of this size with a blocking output to see what happens.


(system) #9

This topic was automatically closed after 21 days. New replies are no longer allowed.