FileBeat EOF Error


(Pier-Hugues Pellerin) #16

@il.bert you are right that bulk size of 5-15mb is ideal, we have work planned to optimize the es output to use the event size to optimize the bulk request.

But even if you do 15k, the ES output will uses a max of 500 items for each bulk request. So if you have really tiny logs, it make sense in you case to adjust the batch size but you may also want to adjust the flush_size of the elasticsearch output see https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size


(Michele) #17

my flush size is 30k :wink: I tried to create pipeline from the default 125 to the top 100k
btw now I changed to 1250, I have an average disk throughput that is little better (around 5MB) I will try put more load with this one, but definitely 125 is too low


Missing a lot of logs sent via filebeats
(Pier-Hugues Pellerin) #18

Perfect, I was missing your logstash config :wink:


#19

I am seeing a similar error. I am using the latest filebeat-linux-arm build on a raspberry pi.

2016/09/22 18:50:56.453871 output.go:109: DBG  output worker: publish 156 events
2016/09/22 18:50:56.453981 sync.go:53: DBG  connect
2016/09/22 18:50:56.456413 sync.go:107: DBG  Try to publish 156 events to logstash with window size 10
2016/09/22 18:50:56.461644 client.go:194: DBG  handle error: EOF
2016/09/22 18:50:56.461826 client.go:110: DBG  closing
2016/09/22 18:50:56.462080 sync.go:78: DBG  0 events out of 156 events sent to logstash. Continue sending
2016/09/22 18:50:56.462213 sync.go:58: DBG  close connection
2016/09/22 18:50:56.462393 sync.go:85: ERR Failed to publish events caused by: EOF
2016/09/22 18:50:56.462511 single.go:91: INFO Error publishing events (retrying): EOF
2016/09/22 18:50:56.462605 sync.go:58: DBG  close connection

I don't think logstash is to blame because it succeeds with an older build: filebeat version 1.2.0-SNAPSHOT (arm)


(Steffen Siering) #20

@pixelrebel please start another discussion. Add in details like logstash logs and versions being used. Is the EOF happen only from time to time or right from the beginning.


(Maxwell Flanders) #21

Hey guys, I found the root of our issue was actually on a misconfigured filter on the logstash end that was clogging the filter workers queue. I guess that manifested itself eventually in filebeats's logs.

Edit: Posted this in wrong thread. Copying it to the correct thread that I started: Missing a lot of logs sent via filebeats


(Steffen Siering) #22

It's a combination of filters/outputs in logstash slowing down inputs + beats input closing connection if this happens.

Related github issue for most recent LS beats input plugin: https://github.com/logstash-plugins/logstash-input-beats/issues/141

Workarounds: older versions require you to increase congestion_threshold, newer versions require you to increase client_inactivity_timeout. In either case it's a good idea trying to figure out the reason for the slowdowns in LS, like some expensive filters.


(Maxwell Flanders) #23

Yeah mine was a case where we had a DNS filter with the wrongs dns ip's in it, so it was doomed to utter failure. However, in cases where honest throughput is just so heavy that it slows you these tools could still definitely help. Would increasing logstash worker number or increasing filebeat worker numbers be expected to help in a situation like that??


(Steffen Siering) #24

there are a number of options in filebeat to increase throughput (some at the cost of increased memory usage).

e.g. filebeat.publish_async and output.logstash.pipelined. Adding workers + enabling load balancing can increase throughput as well (depends on mix of workers, filebeat.spool_size and output.logstash.bulk_max_size).


(Michele) #25

Hi, sadly our test did not proceed well, none of the above workaround worked 100%, we were still getting that error.
As we had the opportunity, we moved to a larger cluster: 3 HUGE machines. We expected one of them to be able to take the whole load but we used 3 for test.

It happened again. EOF

Thus it could not be a problem of fully loaded machine so we moved to changing other parameters.
We found out that removing load_balance in filebeat solved the problem, or at least it seems to.

Again using only 1 single logstash output in filebeat works good (we split our source in 3 group and each group goes to one LS machine), now we have this configuration from 24h and it's working pretty well.

This is not a good solution for us, as we have a cluster of machines we wants it to be working efficiently and reliably.
Do you have any idea on why this is happening? how may we solve it?

thanks


Missing a lot of logs sent via filebeats
(Steffen Siering) #26

can you elaborate which setups + configs did you try exactly and how they failed (too slow, filebeat getting stuck...)?

Did you update the beats input plugin?

Did you set client_inactivity_timeout to 15 minutes?


(Michele) #27

Yes I did, but just little improvements

now I have logstash with 24 workers 8GB of heap space, everything else is default as you suggested (better machines, no throughput problems!)
filebeat default batch size and time

they fail with the EOF error in filebeat and no log being collected form that machine.
now I have the same configuration, no load balancer, logger machines are divided in group and each group mapped one single LS node, and it is working

filter I use are just elapsed, grok and mutate as of now, i don't want to extend before I know it is working

thanks


(Steffen Siering) #28

can you share logs from logstash and filebeat + filebeat configuration + logstash beats input settings? EOF (end of file) is triggered by some remote closing the socket for whatever reason, it's not in the control of filebeat itself. I don't see how proper load-balancing in filebeat can trigger an EOF in filebeat, besides potential longer waiting times. Plus beats should reconnect upon EOF, that is, it's not really be critical.


(Michele) #29

after 2 days running, EOF came out also with the configuration without load_balance.

this is the stats from kibana -> I connected the filebeat log itself (it has been cut but the search is in kibana is "EOF")
it seems that they appear especially during night (off-load period)

I just checkd: logs are always the same as above, nothing on logstash side, EOF and ignore_older on FileBeat side (see topic https://discuss.elastic.co/t/filebeat-file-is-falling-under-ignore-older/61210/3 )

in filebeat i have lot of prospectors with this configuration

- input_type: log

  paths:
    - d:Logger\2
  exclude_lines: ["^\\s*$"]
  include_lines: ["^([a-z]|[A-Z])"]
  fields:
    catena: 11
  fields_under_root: true
  ignore_older: 10m
  close_inactive: 2m
  clean_inactive: 15m
  document_type: log2

general fb conf are the following

output.logstash:
  hosts: ["10.10.10.10"]
  template.name: "filebeat"
  template.path: "filebeat.template.json"
  template.overwrite: false

in LS i have the following

LS_HEAP_SIZE="8g"
LS_OPTS="-b 1000 -r --verbose"
LS_JAVA_OPTS="{LS_JAVA_OPTS} -Djava.io.tmpdir={LS_HOME} -Xms8g"

I've checked di JVM stats and they seems to be very good

LS1

LS2

LS3


FileBeat file is falling under ignore older
(Michele) #30

@steffens could you please tell me how to update the plugin? thanks!


(ruflin) #31

Here is the command how to update the plugin: https://www.elastic.co/guide/en/beats/libbeat/current/logstash-installation.html#logstash-input-update


(Michele) #32

We've done it about 3h ago: still 3 EOF messages in the last hour (as expected they arrive during lunch time when load is low, now it's increasing again. next will be this evening when load is decreasing)


(system) #33

This topic was automatically closed after 21 days. New replies are no longer allowed.


(Steffen Siering) #34

(Steffen Siering) #35

Can you please share you beats input configuration in logstash + logs from logstash at times you discovered the timeouts?

As I said, EOF is not critical, as beats will reconnect. Logstash might still close connection if it finds connection to be "idle" (e.g. very very low sending rate). Which seems to be the case here. Once beats finds it has new events to send it will reconnect and send events. If load-balancing is used, these idle-phases will increase in duration, that's why connections being closed in case of load-balancing is more likely.