Hi everyone,
I'm new to the ELK stack and figuring out how to solve this problem.
Let's say I have a following directory structure:
/logs/dir_x/dir_y/dir_z/task1/log1.csv, code1.py
/logs/dir_a/dir_b/dir_c/task2/log2.csv, code2.py
..... (more will come with time)
/logs/dir_o/dir_k/dir_t/task100000/log100000.csv, code100000.py
where we always have 3 directories between the logs
directory and task
directory, and each task
directory will always have a log.csv
file and a code.py
python file which produces the log.csv
file.
Here is my current Logstash configuration that listens to all of the log files:
input {
path => "/logs/**/**/**/*.csv"
start_position => "beginning"
sincedb_path => "null"
}
filter {
some code here, same for all logs
}
output {
some code here, same for all logs
}
Although this approach works, I don't think it would be able to scale in my project because there are thousands subdirectories in between, and the fact that the Logstash configuration file keeps listening to all of them may potentially take a very long time.
So my new approach is that I will add another python file inside each task directory, say, logstash_runner.py
that will call that Logstash configuration file (now it will know the absolute path into the log1.csv
) when the code1.py
starts (which produces log1.csv
), and prevent the Logstash configuration file listening to the log1.csv
when the code1.py
ends. Is there anyway for my Logstash configuration file to accept different path settings for each of them based on the general one (a configuration file with a fixed filter and output part) so that I don't have to rewrite a new Logstash configuration file in each task
directory ?
I'm also thinking about writing a configuration file generator that will basically copy the Logstash filter and output and generates the Input part. However, I want to make sure that if there are more elegant ways to handle this task.
Thank you very much