Filebeat is harvesting the same log over and over

Hi everyone,

2 days ago i did an upgrade from Logstash and Filebeat from 5.6.x to 6.0, then i got some fun time with index_templates and document_type, but after that, i found the strange behavior of Filebeat.

i only changed the document_type to tags

- type: log
  tags: ["report_daily"]
  paths:
    - /usr/src/the-report/report_daily*.csv


- type: log
  tags: ["report_month"]
  paths:
    - /usr/src/the-report/report_month*.csv

then i was looking for this problem and added these:

- type: log
  tags: ["report_daily"]
  paths:
    - /usr/src/the-report/report_daily*.csv
  ignore_older: 1h
  close_inactive: 30s

- type: log
  tags: ["report_month"]
  paths:
    - /usr/src/the-report/report_month*.csv
  ignore_older: 1h
  close_inactive: 30s

both configs didn't change it after Filebeat is done with the file (based on the amount in Elasticsearch/Kibana), it will start it again.

for example, the month index should be around 196k docs and now it will raise up to 380 or higher...

About the .csv file:
it will be every hour be moved away into an archive folder and if there were changes in the source, where i pull the .csv information, i will throw it with a new name into the log (report) folder, that Filebeat can grab it.

Sure i could rollback to 5.6.x, but would prefer to understand the issue :smiley:

Ah, here is the registry

{"source":"/usr/src/the-report/report_month_oct_2017.csv","offset":0,"timestamp":"2017-11-30T12:09:24.767910017+01:00","ttl":-1,"type":"log","FileStateOS":{"inode":31620,"device":51713}}]

appreciate any help, thanks in advance

Cheers

1 Like

I have a hard time to follow you here.

What did your original configs look like (full config please)?

Why did you swap to tags?

Which output type are you using?

Any errors in filebeat logs?

Hi @steffens

the normal config (version 5.5.1 -> 5.6.x )

filebeat.prospectors:

- input_type: log
  document_type: report_daily
  paths:
    - /usr/src/the-report/report_daily*.csv

- input_type: log
  document_type: report_month
  paths:
    - /usr/src/the-report/report_month*.csv

and with 6.0 the document_type are not supported anymore, so i changed to tags (maybe later to field)

Output is the same and wasnt changed, logstash with SSL

output.logstash:
  hosts: ["logs.example.com:5044"]
  ssl.certificate_authorities: ["/certs/rootCA.pem"]
  ssl.certificate: "/certs/logs.pem"
  ssl.key: "/certs/logs.key"

there is nothing else in the config used.

in Flebeat i cant see any error, it's starting like before and start to harvest the .csv file, but when Filebeat reaches the end, it will start again to harvest the same file, without any changes on the csv file, without any delay.

did i miss something new in 6.0 Filebeat, how it handles files?

Cheers

The fact the offset is still 0 is kinda funny. How big are those csv files? To me it kinda looks like filebeat is missing ACKs from logstash. filebeat uses infinite retry (send-at-least-once semantics). If filebeat does not receive ACK or keep-alive signal from logstash, it has to resend.

Which logstash version are you using?

Can you try with debug logging in filebeat as well? (run with -d '*'). Check logs for registry write messages. Also check metrics being logged (by default filebeat logs internal metrics every 30 seconds).

Hi @steffens ,

sorry for the delay (Weekend and some other work coming up).

  1. the file for the month was now 115MB (.csv file // aws billing reports)
  2. Logstash is 6.0, same as Filebeat, before 6.0 was it working :confused:
  3. in the logs i can see that he get from time to time a timeout on the logstash beats port, but it will retry it and at the end, i can see all the data in Kibana
    *cause im comparing $ billing in Kibana and AWS Dashboard, they're equal, until filebeat gets crazy

will let it run with -d '*' tomorrow morning and update here

Thanks for the help.

Cheers

HI @steffens,

i can see in the logs that at the beginning he get 2 events with an i/o timeout to Logstash, but after that its working without any problem.

 2017-12-05T09:34:45+01:00 ERR  Failed to publish events caused by: read tcp 127.0.0.1:54128->127.0.0.1:5044: i/o timeout
2017-12-05T09:34:45+01:00 ERR  Failed to publish events caused by: read tcp 127.0.0.1:54128->127.0.0.1:5044: i/o timeout
2017-12-05T09:34:45+01:00 ERR  Failed to publish events caused by: client is not connected
2017-12-05T09:34:46+01:00 ERR  Failed to publish events: client is not connected

that happens 2 times, but then i shows no errors:

2017-12-05T09:42:42+01:00 INFO Non-zero metrics in the last 30s: beat.memstats.gc_next=15036192 beat.memstats.memory_alloc=13363488 beat.memstats.memory_total=336135752 filebeat.events.added=2048 filebeat.events.done=2048 filebeat.harvester.open_files=1 filebeat.harvester.running=1 libbeat.config.module.running=0 libbeat.output.events.acked=2048 libbeat.output.events.batches=1 libbeat.output.events.total=2048 libbeat.output.read.bytes=35 libbeat.output.write.bytes=393536 libbeat.pipeline.clients=2 libbeat.pipeline.events.active=4117 libbeat.pipeline.events.published=2048 libbeat.pipeline.events.total=2048 libbeat.pipeline.queue.acked=2048 registrar.states.current=1 registrar.states.update=2048 registrar.writes=1
2017-12-05T09:43:12+01:00 INFO Non-zero metrics in the last 30s: beat.memstats.gc_next=18894176 beat.memstats.memory_alloc=9708720 beat.memstats.memory_total=349468056 filebeat.events.added=2048 filebeat.events.done=2048 filebeat.harvester.open_files=1 filebeat.harvester.running=1 libbeat.config.module.running=0 libbeat.output.events.acked=2048 libbeat.output.events.batches=1 libbeat.output.events.total=2048 libbeat.output.read.bytes=35 libbeat.output.write.bytes=388869 libbeat.pipeline.clients=2 libbeat.pipeline.events.active=4117 libbeat.pipeline.events.published=2048 libbeat.pipeline.events.total=2048 libbeat.pipeline.queue.acked=2048 registrar.states.current=1 registrar.states.update=2048 registrar.writes=1

that's the point why i dont understand that Filebeat is just going crazy and starts to harvest the file over and over :confused:

appreciate your help, maybe I'm missing something big.

Cheers

EDIT: looks like this works again, strange..and i have no clue what happend to get filebeat crazy

Did you check the registry offset in the registry file being updated?

Any logs mentioning truncate?

Filebeat logs when it starts/stops processing a file. Do you see the file being picked up from there?

Logs + metrics when filebeat is acting up would be helpful as well. The i/o timeout occurs when filebeat is waiting for ACK from Logstash. Logstash either sends an ACK or a keep-alive about every 5 seconds. The timeout in filebeat occurs after 30s.

Any problems in logstash (queues filling up?), waiting for outputs or slow grok?

How many beats do you have pushing to the logstash instance?

I have the same error and here is the log I have in filebeat (Docker image, 6.0.0, with command: -e -d "*" in my docker-compose.yml, connected to a logstash:6.0.0) that keeps repeating:

2017/12/13 20:49:09.809097 client.go:197: DBG handle error: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809285 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809306 client.go:114: DBG closing
2017/12/13 20:49:09.809375 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809455 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809502 async.go:143: DBG 91 events out of 91 events sent to logstash. Continue sending
2017/12/13 20:49:09.809570 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809460 logger.go:22: INFO retryer: send wait signal to consumer
2017/12/13 20:49:09.809595 logger.go:22: INFO   done
2017/12/13 20:49:09.809702 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.809733 async.go:235: ERR Failed to publish events caused by: read tcp 172.18.0.5:47845->172.18.0.3:5044: i/o timeout
2017/12/13 20:49:09.816752 async.go:143: DBG 159 events out of 159 events sent to logstash. Continue sending
2017/12/13 20:49:09.816799 async.go:99: DBG close connection
2017/12/13 20:49:09.816825 async.go:99: DBG close connection
2017/12/13 20:49:09.816858 async.go:235: ERR Failed to publish events caused by: client is not connected
2017/12/13 20:49:10.817036 output.go:92: ERR Failed to publish events: client is not connected
2017/12/13 20:49:10.817077 async.go:94: DBG connect
2017/12/13 20:49:10.906815 async.go:143: DBG 2048 events out of 2048 events sent to logstash. Continue sending
2017/12/13 20:49:10.919211 async.go:143: DBG 289 events out of 289 events sent to logstash. Continue sending
2017/12/13 20:49:10.926479 async.go:143: DBG 170 events out of 170 events sent to logstash. Continue sending
2017/12/13 20:49:10.931673 async.go:143: DBG 111 events out of 111 events sent to logstash. Continue sending
2017/12/13 20:49:10.935736 async.go:143: DBG 98 events out of 98 events sent to logstash. Continue sending
2017/12/13 20:49:10.935899 logger.go:22: INFO retryer: send unwait-signal to consumer
2017/12/13 20:49:10.935921 logger.go:22: INFO   done
2017/12/13 20:49:32.494594 metrics.go:39: INFO Non-zero metrics in the last 30s: beat.memstats.gc_next=61527536 beat.memstats.memory_alloc=50120920 beat.memstats.memory_total=2237070864 filebeat.harvester.open_files=1 filebeat.harvester.running=5 libbeat.config.module.running=1 libbeat.output.events.batches=7 libbeat.output.events.failed=2966 libbeat.output.events.total=2966 libbeat.output.read.errors=1 libbeat.output.write.bytes=662955 libbeat.pipeline.clients=2 libbeat.pipeline.events.active=4117 libbeat.pipeline.events.retry=4096 registrar.states.current=462

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.