I am running an ELK stack and tailoring the Logstash configurations to elicit data from several different logs. Because the logs are formatted differently I need to create multiple file inputs. The following is the current configuration:
However, my current configurations result in logstash reading files in the directory /usr/share/logstash/ingest_data/app/zsh/ twice, even though I explicitly configured the first file input to exclude files in that directory (I checked the file /usr/share/logstash/finalised_data/logstash_completed.log and two entries are created when I add a single file)
I cannot find an explanation for this other than unintentional behaviour when having multiple file inputs.
To debug the issue tried to use only one of the file inputs at a time. When using the first input a zsh log is not read, and when using the second input a zsh log is read just once as intended.
I am running a docker compose stack as described here with the version 8.7.1.
When having multiple file inputs, you need to configure sincedb_path explicitly.
Path of the sincedb database file (keeps track of the current position of monitored log files) that will be written to disk. The default will write sincedb files to <path.data>/plugins/inputs/file NOTE: it must be a file path and not a directory path
If not set, the values of both file inputs will overwrite each other. Maybe this is the reason for your issues.
Thanks for your answer. I have tried to follow your suggestion and by setting the sincedb_path explicitly (the field file_input is just to make it easier for me to debug):
After doing some additional testing, I have realised, that the issue is not with multiple filters - my bad. When only using the initial file input, logs are still processed in the pipeline even though they are not supposed to.
With that out of the way (and maybe I should create a new topic, since the title does not describe the true issue), you suggestion of splitting the path up in several paths is doable, but I am planning my project to be widely extendable, i.e. I want to have many directories/subdirectories. Thus, it is not a suitable solution for me.
I my understanding of the docs, it is possible to exclude all files in a subdirectory, though its parents directory is part of the path option.
I have tried to strip as much of my Logstash configuration, which reads files located in /usr/share/logstash/ingest_data/app/zsh/:
I do not expect that to work. As @Wolfram_Haussig said, the exclude option takes a filename pattern. If you look at the source code you will see that it is calling basename so all of the directory names are discarded before the comparison is made.
My understanding is that fnmatch? requires the whole pattern to match. Thus basename will reduce /usr/share/logstash/ingest_data/app/zsh/foo.txt to foo.txt and foo,txt does not match /usr/share/logstash/ingest_data/app/zsh/*. In terms of the source, watched_file.pathname would match, but watched_file.pathname.basename does not.
Finally, I understand how works. Thank you both @Wolfram_Haussig and @Badger. The capital 'I' in "In Tail mode..." in the docs mislead me to believe it described two different examples, but now I understand it is the same example with a path (path => /var/log/*) and a pattern (exclude => "*.gz") for excluding files GunZip files.
I marked @Wolfram_Haussig's answer as a solution as i best described a solution to my question.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.