Can Filebeat halt on failed published event?

(Ionut Dinu) #1


Is there such an option to stop publishing (anything new) when output fails for whatever reason?

To give an example
My Filebeat config (minimal) is like this:

- type: log
  json.keys_under_root: true
    - /path/to/file.log
  hosts: [""]
  template.enabled: false
  index: "log-%{[index]}"

My log file is already in json format (one json per line)

For different reasons (eg: field type mismatch, missing mappings, elastic down, etc) publish action may fail. I don't want to loose that log line. I'd like Filebeat to stop harvesting and publishing and alert me somehow (lock file, email, smoke signals, anything I can monitor). Then I can fix the issue and restart Filebeat.

All I have now is a WARN in the log like this (I know what is is, it's missing mapping, it's intentional for this test):

2018-09-25T23:21:05.999+0300	WARN	elasticsearch/client.go:502	Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbee2c40c3a701471, ext:15139988, loc:(*time.Location)(0x52ca960)}, Meta:common.MapStr(nil), Fields:common.MapStr{"beat":common.MapStr{"name":"localhost", "hostname":"localhost", "version":"6.2.4"}, "source":"/path/to/file.log", "offset":127, "index":"2018-08-31", "message":"test log for demo", "lines":[]interface {}{common.MapStr{"line":0, "message":"message1"}, common.MapStr{"line":1, "message":"message2"}}, "prospector":common.MapStr{"type":"log"}}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc4200bdd40), Source:"/path/to/file.log", Offset:127, Timestamp:time.Time{wall:0xbee2c40c3a60e101, ext:14143782, loc:(*time.Location)(0x52ca960)}, TTL:-1, Type:"log", FileStateOS:file.StateOS{Inode:0x23d3f8, Device:0x100000a}}}, Flags:0x1} (status=400): {"type":"illegal_argument_exception","reason":"object mapping [lines] can't be changed from nested to non-nested"}

Thank you

(Pier-Hugues Pellerin) #2

@tunder Hello, we don't have that kind of mechanism inside Filebeat to notify on specific problems other than logs and metrics. Depending on the behavior, let's say that filebeat stop completely sending events to ES this behavior will affect the metrics that Filebeat collect.

These metrics are sent to an Elasticsearch cluster and since metrics are just document in an index we could possibly configure a Watcher job to check if something is wrong and send an email.

(Ionut Dinu) #3

Thank you for taking the time to respond to me.

I'm not keen on having layer on top of another layer on top of another layer to solve a problem.
I'm not using x-pack but even if I did my guess is that the information found inside metrics is the same to the one I found in Filebeat's logs and it says nothing about failed publishing events.

Here is a snipet form the logs:

2018-09-25T23:34:41.192+0300    WARN    elasticsearch/client.go:502 Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xbee2c4d80783261e, ext:60036277031, loc:(*time.Location)(0x52ca960)}, Meta:common.MapStr(nil), Fields:common.MapStr{"prospector":common.MapStr{"type":"log"}, "beat":common.MapStr{"name":"localhost", "hostname":"localhost", "version":"6.2.4"}, "index":"2018-08-31", "source":"/path/to/file.log", "offset":8787, "message":"test log for demo", "lines":[]interface {}{common.MapStr{"message":"message1", "line":0}, common.MapStr{"line":1, "message":"message2"}}}, Private:file.State{Id:"", Finished:false, Fileinfo:(*os.fileStat)(0xc4200b9ad0), Source:"/path/to/file.log", Offset:8787, Timestamp:time.Time{wall:0xbee2c4d807471ad8, ext:60032341994, loc:(*time.Location)(0x52ca960)}, TTL:-1, Type:"log", FileStateOS:file.StateOS{Inode:0x23d438, Device:0x100000a}}}, Flags:0x1} (status=400): {"type":"illegal_argument_exception","reason":"object mapping [lines] can't be changed from nested to non-nested"}
2018-09-25T23:35:10.105+0300    INFO    [monitoring]    log/log.go:124  Non-zero metrics in the last 30s    {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":30,"time":30},"total":{"ticks":64,"time":64,"value":64},"user":{"ticks":34,"time":34}},"info":{"ephemeral_id":"cf4151a8-8f03-4da8-ab2d-e042b14a6b01","uptime":{"ms":90012}},"memstats":{"gc_next":4194304,"memory_alloc":2427488,"memory_total":5861520,"rss":1630208}},"filebeat":{"events":{"added":7,"done":7},"harvester":{"open_files":1,"running":1,"started":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"batches":5,"dropped":5,"total":5},"read":{"bytes":2075},"write":{"bytes":11465}},"pipeline":{"clients":1,"events":{"active":0,"filtered":2,"published":5,"retry":1,"total":7},"queue":{"acked":5}}},"registrar":{"states":{"cleanup":1,"current":1,"update":7},"writes":6},"system":{"load":{"1":2.0464,"15":1.5986,"5":1.7344,"norm":{"1":0.2558,"15":0.1998,"5":0.2168}}}}}}

Do you think this is something that can be taken into account for a feature request?
I would like Filebeat to stop until the problem is solved (in this case adding the mappings; in other cases I may just remove the log line that is causing problems but I need to know which line is that).

Best regards

PS: I tried adding xpack.monitoring config parameters (after installing the plugin) and set it up to send the metrics to another elasticsearch cluster(url). Filebeat won't start complaining that

Exiting: 'xpack.monitoring.elasticsearch.hosts' and 'output.elasticsearch.hosts' are configured

(Pier-Hugues Pellerin) #4

I am not sure if we would support stopping complete, I am sure we are tracking the error rate I will need to double check if we expose it in the xpack UI, but this metric could be something to monitor? Higher error rate would mean that we need to investigate.

(Ionut Dinu) #5

How about allowing to send metrics to something else than elasticsearch?

My guess is that it is doing a web request (curl) to send metrics to elasticsearch. See my previous PS and allow xpack.monitoring output to be a custom url (maybe a web service endpoint) where we can read the metrics and do our own logic (in my case I'll send email alerts) in case of failed publish events.

I see in the metrics this info:


Does this mean that out of 5 batches 5 were failed?

Thank you

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.