Harvesting Old Files That Were Recently Modified

locolbd · January 19, 2017, 1:16am

Hello All,

I am unable to get filebeat to harvest a file that was created years ago but recently modified.

Background:
Curator deletes any index that is older than 14 days. (elasticsearch)
File was created 2013, however get written to a couple time a day. In this one log file that are log entries from 2013
All filebeat clean_* configs are default.
ignore_older is set to 336h
close_inactive is set to 5m

When i try to read the file with tail_file: false, it reads the file and a ton of old indexes get created for 2013+ ex. logstash-2013.01.03
This causes a ton of issues with the Elastic stack which eventually fails. Errors like the following are found

"[2017-01-18T08:53:41,774][DEBUG][o.e.a.b.TransportShardBulkAction] [hostname] [logstash-2014.06.26][4] failed to execute bulk item (index) index {[logstash-2014.06.26][log][AVmyfw5BX3wdBTDUiiu-], source[{"Category":"msoulscat_WSS_General","offset":133355,"ThrdId":"22","input_type":"log","Pid":"3200","source":"L:\\Logs\\Debug\\AdminService.log","message":"Exiting SPAdvApi32.CheckRestartService","type":"log","tags":["adminservice","beats","filebeats","beats_input_codec_plain_applied"],"@timestamp":"2014-06-26T13:11:09.000Z","@version":"1","beat":{"hostname":"hostname","name":"hostname","version":"5.1.2"},"host":"hostname","TagId":"0","Level":"Verbose","filter_node":"ist000282"}]}
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService.lambda$null$4(ClusterService.java:449) ~[elasticsearch-5.1.2.jar:5.1.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) ~[elasticsearch-5.1.2.jar:5.1.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]"

Also on the master node:
' your index map is too complex'

Normally ES contains about 304 shards, but when this problem occurs there is approximately 1500 shards being reassigned which seems to shutdown everything.

With that said filebeat will not read this old file event if I also delete the registry and added tail_file: true

Any suggestions?

ruflin · January 19, 2017, 9:41am

As far as I can see you send data through LS to be processed. I assume in your logstash config you create daily indices based on the timestamps. So as the files is 4 years old and you have every day a few events, you create 4*365 indices and the according shards. I would recommend you to not create daily indices for this file the prevent the above problem.

About not reading the file: Which filebeat version are you using? Can you share some filebeat log files?

locolbd · January 19, 2017, 4:27pm

Thanks for your response ruflin. I am using filebeat 5.1.2

I tried latter part above on a new log source and this one read the 4 year old file with tail_file enabled. However it still read the 2013 logs.

I will try your first suggestion.

ruflin · January 23, 2017, 8:19am

Did you clean the registry before restarting? Could you share some log output at least on the info level from filebeat?

system · February 20, 2017, 8:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat 6.6.0 doesn't harvest old logfile (even at first time run) Beats filebeat	1	568	March 7, 2019
Filebeat won't send files cause ignore_older reached Beats filebeat	4	1204	February 8, 2017
Filebeat recreate Historical Data index After Deleting Old Indexes Beats docker , filebeat	2	142	April 18, 2024
Filebeat not working on a particular path Beats filebeat	4	566	May 20, 2021
Filebeats not harvesting new file Beats filebeat	5	2364	February 8, 2019

Harvesting Old Files That Were Recently Modified

Related topics