Harvesting Old Files That Were Recently Modified

Hello All,

I am unable to get filebeat to harvest a file that was created years ago but recently modified.

Background:
Curator deletes any index that is older than 14 days. (elasticsearch)
File was created 2013, however get written to a couple time a day. In this one log file that are log entries from 2013
All filebeat clean_* configs are default.
ignore_older is set to 336h
close_inactive is set to 5m

When i try to read the file with tail_file: false, it reads the file and a ton of old indexes get created for 2013+ ex. logstash-2013.01.03
This causes a ton of issues with the Elastic stack which eventually fails. Errors like the following are found

"[2017-01-18T08:53:41,774][DEBUG][o.e.a.b.TransportShardBulkAction] [hostname] [logstash-2014.06.26][4] failed to execute bulk item (index) index {[logstash-2014.06.26][log][AVmyfw5BX3wdBTDUiiu-], source[{"Category":"msoulscat_WSS_General","offset":133355,"ThrdId":"22","input_type":"log","Pid":"3200","source":"L:\\Logs\\Debug\\AdminService.log","message":"Exiting SPAdvApi32.CheckRestartService","type":"log","tags":["adminservice","beats","filebeats","beats_input_codec_plain_applied"],"@timestamp":"2014-06-26T13:11:09.000Z","@version":"1","beat":{"hostname":"hostname","name":"hostname","version":"5.1.2"},"host":"hostname","TagId":"0","Level":"Verbose","filter_node":"ist000282"}]}
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
    at org.elasticsearch.cluster.service.ClusterService.lambda$null$4(ClusterService.java:449) ~[elasticsearch-5.1.2.jar:5.1.2]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:458) ~[elasticsearch-5.1.2.jar:5.1.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]"

Also on the master node:
' your index map is too complex'

Normally ES contains about 304 shards, but when this problem occurs there is approximately 1500 shards being reassigned which seems to shutdown everything.

With that said filebeat will not read this old file event if I also delete the registry and added tail_file: true

Any suggestions?

As far as I can see you send data through LS to be processed. I assume in your logstash config you create daily indices based on the timestamps. So as the files is 4 years old and you have every day a few events, you create 4*365 indices and the according shards. I would recommend you to not create daily indices for this file the prevent the above problem.

About not reading the file: Which filebeat version are you using? Can you share some filebeat log files?

Thanks for your response ruflin. I am using filebeat 5.1.2

I tried latter part above on a new log source and this one read the 4 year old file with tail_file enabled. However it still read the 2013 logs.

I will try your first suggestion.

Did you clean the registry before restarting? Could you share some log output at least on the info level from filebeat?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.