Pipeline continues when stopping Logstash service

Logstash v.7.13.2

Hi all,

I'm successfully running 3 logstash instances via the following pipelines.yml file to keep an elasticsearch index up-to-date with changes to Oracle database records and windows logs:

- pipeline.id: upsert
  queue.type: persisted
  queue.checkpoint.writes: 1
  path.config: "<PATH TO UPSERT CONFIG>"
  path.queue: "<PATH TO UPSERT QUEUE>"
- pipeline.id: delete
  queue.type: persisted
  queue.checkpoint.writes: 1
  path.config: "<PATH TO DELETE CONFIG>"
  path.queue: "<PATH TO DELETE QUEUE>"
- pipeline.id: winlogs
  path.config: "<PATH TO WINLOGS CONFIG>"
  path.queue: "<PATH TO WINLOGS QUEUE>"

My logstash.yml file:

path.data: "<path/to/data/dir>"
path.logs: "<path/to/logs/dir>"
pipeline.unsafe_shutdown: true
pipeline.separate_logs: true

Within the upsert & delete pipelines, I schedule a connection to my DB via the jdbc input plugin, do some filtering / processing, and output to my elasticsearch. My winlogs pipeline simply opens a port for incoming beat logs from my servers. I've incorporated starting / stopping the pipelines via a systemd service. This is because I run a daily scheduled full build of my index to ensure it is up-to-date with my database, and must shut down the pipelines during the build because the index is temporarily unavailable.

# upsert/delete pipeline in
input {
    jdbc {
        jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
        jdbc_driver_library => "<path/to/jdbc.jar>"
        jdbc_connection_string => "jdbc:oracle:thin:@<connection>"            
        jdbc_default_timezone => "<timezone>"
        jdbc_user => "<user>"
        jdbc_password => "<pass>"
        statement_filepath => "<path/to/sql/statement>"
        schedule => "* * * * *"
        last_run_metadata_path => "<path/to/timestamp>"
     }
}

# winlogs pipeline in
input {
    beats {
        port => 5044
    }
}

Until now, this has worked fine. However, I recently noticed that during one of my full builds, the update pipeline refused to stop. This I also see in my logs: both the delete / winlogs pipelines were terminated, but the update pipeline did not register the stop at all. Checking the status of the service, I see it's been deactivating now for five hours.

I had thought that when the pipelines register a stop, that all in-flight events are processed and the pipeline then terminated. This doesn't seem to be the case with my update pipeline. When I check the number of events in queue, I find zero as well, so it can't be that it's still processing in-flight events. Until I solve this problem, the other pipelines are also stuck waiting for the service to restart, so it's a big problem.

I appreciate any feedback!

I suggest getting a thread dump from the JVM and looking at what the threads are hung up on.

Thanks for the tip @Badger. Ultimately, we never found the exact reason that the pipeline didn't stop. We suspect it may have something to do some syncing problem with the processing of persistent queued events and the writing of our jdbc_metadata last run sql timestamps and scheduling.

Ultimately, we opted for killing the threads directly. As we are essentially stopping the pipelines in order to rebuild the index and ensure up-to-date-ness with our database, any events lost by killing the pipelines are brought into the index during rebuild anyway, so data security is ensured. To anyone absolutely needing queued events though, this is not a wise solution.