Scalability of Logstash when using path => "/path/to/file/*/*/*/*/*/test.log"

Hello,

I'm new to the ELK stack and trying to write a Logstash configuration file to monitor a dynamic file structure. I already have two approaches that work, but really want to learn which is a better way.

Approach 1:

input {
  file {
    path => "/tasks/**/**/**/**/**/test.log" 
  }
}

Approach 2 (with the help of a Logstash feature, reloading the config file):

input {
  file {
    path => "/tasks/2020/July/31/23/59/task1/test.log" 
  }
}

Basically, there is only one test.log (produced by a task like in the second example) that needs to be parsed by Logstash at any given time (Logstash shouldn't listen to the old logs such as from task0 anymore), but since the file structure in between keeps changing, I'm not sure if approach 1 can scale when there are millions tasks coming.

For the second approach, I'm lucky to come across a Logstash feature that allows me to reload the config file every time there is a new test.log so Logstash doesn't need to listen to the old logs anymore). For example, I can change
path => "/tasks/2020/July/31/23/59/task1/test.log"
to
path => "/tasks/2021/August/21/23/59/task1000/test.log" if I want.

Questions:
Which approach do you think will work better in my situation ? Or can you explain me how Logstash file input plugin works when I use a pattern like "/tasks/**/**/**/**/**/test.log" ? Does it search every second ? Does keeping track of the old logs that would never be updated again affect the performance of Logstash ?

Thank you very much !!!

It might make more sense to use Filebeat here, it has recursive globbing that should make things a little easier in the config file.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.