FileBeat - Logstash - Filter CSV - Elastic

Ajay1 · January 16, 2017, 3:51pm

I have installed Filebeat in local server where i want to pick some log files from different folders on daily basis . But also the folder contains data for past 30 days .
Will File beat work based on the current dated files and push to Log stash only once in a day or in other words no duplicate shipper .

And i also there are other prospector in the Filebeat.yml ,so should i create new one or additional prospector in same yml without disturbing existing config which is not going to use Log stash as Output ?

ruflin · January 17, 2017, 9:28am

If you only want to ship newer files, then you can use the ignore_older configuration option.
What do you mean by no duplicate shipper?
One filebeat instance can only have one output configuration. If you want to send one prospector to LS and an other one to ES, you need two filebeat instances.

ruflin · January 18, 2017, 8:00am

So the files containing the reports are new files created every day or are appended to old reports? Filebeat is tracking the modification date of a file. So if the file got modified it will fetch all the new lines added to the file.

For csv you would best use csv filter in logstash: https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html But I think that is what you meant above, right?

Ajay1 · January 26, 2017, 4:00pm

@ruflin : Any valuable suggestions on the below case ?

Is it possible to provide date value ´date + %Y-%m-%d´in the filename for the path
eg: - /var/log/reports/completed/reports*-´date + %Y-%m-%d´.csv ?

Current scenario :
I have the following config and the folders contains past 90 days file . Each day new files with different name file is generated (15 files) . When i ran it for first time , it picked all the 90 days file and got some error too many open files for harvesting and at moment redis also stopped working . I cleared the registry file and kept it as empty and re-ran, still i saw it picked some random old dated files .
It doesnt pick the files which is not older than 24 hours .

Now how can i make it work as i wanted in config ? Do i need to remove registry or some cleanup.

-
  paths:
    - /var/log/reports/completed/reports*.csv
  input_type: log
  document_type: log
  tags: ["REPORT"]
  fields:
     app: recon_files
     ignore_older: 24h
     close_inactive: 1h
     clean_inactive: 25h
  fields_under_root: true

Err: Error setting up harvester: Harvester setup failed. Unexpected file opening error: Failed opening /var/log/reports/completed/reports-08-31_07-07-02_000505.csv: open /var/log/reports/completed/reports_2015-08-31_07-07-02_000505.csv: too many open files

> ERR Connecting error publishing events (retrying): lookup server on 10.xxx.x.xx:00: dial udp 10.xxx.x.xx:00: socket: too many open files

> 2017-01-25T15:57:48+01:00 ERR Connecting error publishing events (retrying): read tcp 10.xx.xx.xx:00000->10.xxx.x.xx:0000: read: connection reset by peer
> 2017-01-25T15:58:03+01:00 INFO Non-zero metrics in the last 30s: libbeat.redis.publish.read_errors=1 libbeat.redis.publish.write_bytes=14
> 2017-01-25T15:58:33+01:00 INFO No non-zero metrics in the last 30s
> ERR Writing of registry returned error: open /var/lib/filebeat/registry.new: too many open files. Continuing...
> ERR Failed to create tempfile (/var/lib/filebeat/registry.new) for writing: open /var/lib/filebeat/registry.new: too many open files
> Strangely you can see this one file is tried more than once and also without any successful .

> filebeat.2:2017-01-26T08:33:04+01:00 ERR Harvester could not be started on new file: /var/log/reports/completed/reports_2016-10-25_07-17-02_000451.csv, Err: prospector outlet closed
> filebeat.1:2017-01-26T08:33:09+01:00 INFO Harvester started for file: /var/log/reports/completed/reports_2016-10-25_07-17-02_000451.csv
> filebeat.1:2017-01-26T08:38:14+01:00 INFO File is inactive: /var/log/reports/completed/reports_2016-10-25_07-17-02_000451.csv. Closing because close_inactive of 5m0s reached.
> filebeat:2017-01-26T09:27:08+01:00 INFO Harvester started for file: /var/log/reports/completed/reports_2016-10-25_07-17-02_000451.csv
> filebeat:2017-01-26T09:32:13+01:00 INFO File is inactive: /var/log/reports/completed/reports_2016-10-25_07-17-02_000451.csv. Closing because close_inactive of 5m0s reached.

Filebeat is closed and also sometimes redis stopped .

ruflin · January 30, 2017, 12:08pm

You can use date strings in the glob
Try using harvester_limit if you get too many open file errors.: https://www.elastic.co/guide/en/beats/filebeat/5.x/configuration-filebeat-options.html#harvester-limit

Let me know if this already solves your problem.

ruflin · January 31, 2017, 12:34pm

I'm really sorry, I wanted to state date string CANNOT be used

system · February 28, 2017, 12:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat not picking the files as expected Beats filebeat	12	8894	March 1, 2017
fileBeat isn't harvesting the logs from the last path's Beats filebeat	9	3683	December 27, 2017
Maybe I missed something: FIlebeat prospectors seems to randomly stop harvesting 'some' files Beats filebeat	7	4063	July 5, 2017
Kibana is not showing all the logfiles from the path, shows only one file Beats filebeat	23	7293	July 5, 2017
Filebeat do not see file updates after a while Beats filebeat	33	10434	July 5, 2017

FileBeat - Logstash - Filter CSV - Elastic

Related topics