FileBeat EOF Error


(Pier-Hugues Pellerin) #12

@il.bert I've created this https://github.com/logstash-plugins/logstash-input-beats/issues/141 to track it.


(Michele) #13

thanks a lot for clarification Pier, i will wait for the solution, any ideas when it will be out?
In the mean time i will try with clinet_inactivity_timeout

actually was Steffen suggesting to use congestion_threshold up there, but it did not solved the problem

I used such a large value because right now my log messages are tremendously small, I will try with 125 if it works better, thanks


(Pier-Hugues Pellerin) #14

@il.bert I would stay with the defaults as much you can, we do our best to provide a good experience out of the box.

The number of workers depends on many factors, what kind of works you are doing and what machine you have, you have to do some experimentation to find the appropriate number for you.

I will try to release a fix really soon, since its a plugin you will be able to update it without updating the whole Logstash.


(Michele) #15

with a value of 125 I am having a very small disk throughput (<1MB) in elastic nodes, instead moving to larger batch (15k) allowed me to have a throughput of 10MB on average with peak of 100MB
my log lines are very small, I read in the guide that is preferred to have bulk of 5-15MB that's why I extended it :slight_smile:

btw thanks I wait for the update


(Pier-Hugues Pellerin) #16

@il.bert you are right that bulk size of 5-15mb is ideal, we have work planned to optimize the es output to use the event size to optimize the bulk request.

But even if you do 15k, the ES output will uses a max of 500 items for each bulk request. So if you have really tiny logs, it make sense in you case to adjust the batch size but you may also want to adjust the flush_size of the elasticsearch output see https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-flush_size


(Michele) #17

my flush size is 30k :wink: I tried to create pipeline from the default 125 to the top 100k
btw now I changed to 1250, I have an average disk throughput that is little better (around 5MB) I will try put more load with this one, but definitely 125 is too low


Missing a lot of logs sent via filebeats
(Pier-Hugues Pellerin) #18

Perfect, I was missing your logstash config :wink:


#19

I am seeing a similar error. I am using the latest filebeat-linux-arm build on a raspberry pi.

2016/09/22 18:50:56.453871 output.go:109: DBG  output worker: publish 156 events
2016/09/22 18:50:56.453981 sync.go:53: DBG  connect
2016/09/22 18:50:56.456413 sync.go:107: DBG  Try to publish 156 events to logstash with window size 10
2016/09/22 18:50:56.461644 client.go:194: DBG  handle error: EOF
2016/09/22 18:50:56.461826 client.go:110: DBG  closing
2016/09/22 18:50:56.462080 sync.go:78: DBG  0 events out of 156 events sent to logstash. Continue sending
2016/09/22 18:50:56.462213 sync.go:58: DBG  close connection
2016/09/22 18:50:56.462393 sync.go:85: ERR Failed to publish events caused by: EOF
2016/09/22 18:50:56.462511 single.go:91: INFO Error publishing events (retrying): EOF
2016/09/22 18:50:56.462605 sync.go:58: DBG  close connection

I don't think logstash is to blame because it succeeds with an older build: filebeat version 1.2.0-SNAPSHOT (arm)


(Steffen Siering) #20

@pixelrebel please start another discussion. Add in details like logstash logs and versions being used. Is the EOF happen only from time to time or right from the beginning.


(Maxwell Flanders) #21

Hey guys, I found the root of our issue was actually on a misconfigured filter on the logstash end that was clogging the filter workers queue. I guess that manifested itself eventually in filebeats's logs.

Edit: Posted this in wrong thread. Copying it to the correct thread that I started: Missing a lot of logs sent via filebeats


(Steffen Siering) #22

It's a combination of filters/outputs in logstash slowing down inputs + beats input closing connection if this happens.

Related github issue for most recent LS beats input plugin: https://github.com/logstash-plugins/logstash-input-beats/issues/141

Workarounds: older versions require you to increase congestion_threshold, newer versions require you to increase client_inactivity_timeout. In either case it's a good idea trying to figure out the reason for the slowdowns in LS, like some expensive filters.


(Maxwell Flanders) #23

Yeah mine was a case where we had a DNS filter with the wrongs dns ip's in it, so it was doomed to utter failure. However, in cases where honest throughput is just so heavy that it slows you these tools could still definitely help. Would increasing logstash worker number or increasing filebeat worker numbers be expected to help in a situation like that??


(Steffen Siering) #24

there are a number of options in filebeat to increase throughput (some at the cost of increased memory usage).

e.g. filebeat.publish_async and output.logstash.pipelined. Adding workers + enabling load balancing can increase throughput as well (depends on mix of workers, filebeat.spool_size and output.logstash.bulk_max_size).


(Michele) #25

Hi, sadly our test did not proceed well, none of the above workaround worked 100%, we were still getting that error.
As we had the opportunity, we moved to a larger cluster: 3 HUGE machines. We expected one of them to be able to take the whole load but we used 3 for test.

It happened again. EOF

Thus it could not be a problem of fully loaded machine so we moved to changing other parameters.
We found out that removing load_balance in filebeat solved the problem, or at least it seems to.

Again using only 1 single logstash output in filebeat works good (we split our source in 3 group and each group goes to one LS machine), now we have this configuration from 24h and it's working pretty well.

This is not a good solution for us, as we have a cluster of machines we wants it to be working efficiently and reliably.
Do you have any idea on why this is happening? how may we solve it?

thanks


Missing a lot of logs sent via filebeats
(Steffen Siering) #26

can you elaborate which setups + configs did you try exactly and how they failed (too slow, filebeat getting stuck...)?

Did you update the beats input plugin?

Did you set client_inactivity_timeout to 15 minutes?


(Michele) #27

Yes I did, but just little improvements

now I have logstash with 24 workers 8GB of heap space, everything else is default as you suggested (better machines, no throughput problems!)
filebeat default batch size and time

they fail with the EOF error in filebeat and no log being collected form that machine.
now I have the same configuration, no load balancer, logger machines are divided in group and each group mapped one single LS node, and it is working

filter I use are just elapsed, grok and mutate as of now, i don't want to extend before I know it is working

thanks


(Steffen Siering) #28

can you share logs from logstash and filebeat + filebeat configuration + logstash beats input settings? EOF (end of file) is triggered by some remote closing the socket for whatever reason, it's not in the control of filebeat itself. I don't see how proper load-balancing in filebeat can trigger an EOF in filebeat, besides potential longer waiting times. Plus beats should reconnect upon EOF, that is, it's not really be critical.


(Michele) #29

after 2 days running, EOF came out also with the configuration without load_balance.

this is the stats from kibana -> I connected the filebeat log itself (it has been cut but the search is in kibana is "EOF")
it seems that they appear especially during night (off-load period)

I just checkd: logs are always the same as above, nothing on logstash side, EOF and ignore_older on FileBeat side (see topic https://discuss.elastic.co/t/filebeat-file-is-falling-under-ignore-older/61210/3 )

in filebeat i have lot of prospectors with this configuration

- input_type: log

  paths:
    - d:Logger\2
  exclude_lines: ["^\\s*$"]
  include_lines: ["^([a-z]|[A-Z])"]
  fields:
    catena: 11
  fields_under_root: true
  ignore_older: 10m
  close_inactive: 2m
  clean_inactive: 15m
  document_type: log2

general fb conf are the following

output.logstash:
  hosts: ["10.10.10.10"]
  template.name: "filebeat"
  template.path: "filebeat.template.json"
  template.overwrite: false

in LS i have the following

LS_HEAP_SIZE="8g"
LS_OPTS="-b 1000 -r --verbose"
LS_JAVA_OPTS="{LS_JAVA_OPTS} -Djava.io.tmpdir={LS_HOME} -Xms8g"

I've checked di JVM stats and they seems to be very good

LS1

LS2

LS3


FileBeat file is falling under ignore older
(Michele) #30

@steffens could you please tell me how to update the plugin? thanks!


(ruflin) #31

Here is the command how to update the plugin: https://www.elastic.co/guide/en/beats/libbeat/current/logstash-installation.html#logstash-input-update