Logstash systemd service restarts itself

Hello community,

I'm running a logstash instance with the following config via a systemd service for minute-wise updates of an ES index:

input {
        jdbc {
                jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
                jdbc_driver_library => "/usr/local/bin/ojdbc8.jar"
                jdbc_connection_string => "jdbc:oracle:thin:@<DB ADDRESS>"
                jdbc_user => "<user>"
                jdbc_password => "<pass>"
                statement_filepath => "<path>/<to>/<SQL file>"
                schedule => "* * * * *"
                last_run_metadata_path => "<path>/<to>/<jdbc_metadata_file>"
                enable_metric => false
                tracking_column => "date_changed"
                use_column_value => true
                tracking_column_type => "timestamp"
     }
}

For hosting our data, we have two indices, an active and inactive. The active hosts data for access via our application. The inactive we use for daily full builds (that last around 4-5 minutes). After rebuilding the inactive index, we switch active & inactive aliases to ensure all records are always available via the application; to make sure the scheduled update pipeline didn't miss any records.

The pipeline runs and updates without issue. However, we've noticed the service restarts itself during the full build of the inactive index. As a results, records updated by the restarted service are missing when the aliases are switched.

Broken down into a simple example:

11:20am - Pipeline Running, set metadata timestamp 11:20
11:21am - Pipeline Stop for Full Build; Build starts; *set metadata timestamp 11:21
Before full build finishes - Service mysteriously restarts; pipeline sets metadata timestamp
...
Every minute until build ends: Mystery pipelines set metadata timestamp;
...
11:25am: Build ends; Alias Switch; Restart Pipeline;
*expected first update run with metadata timestamp of 11:21
IF mysteriously started pipeline was in the middle of scheduled run when full build ends:
set metadata timestamp around 11:25
11:26am: first pipeline run after full build uses 11:25 metadata timestamp
-> Result: Records updated between 11:21am and 11:25am are never updated after switch.

We configure our service like this:

[Unit]
Description=logstash

[Service]
Type=simple
User=logstash
Group=logstash
EnvironmentFile=-/etc/default/logstash
EnvironmentFile=-/etc/sysconfig/logstash
ExecStart=/usr/share/logstash/bin/logstash "--path.settings=/home/apps/logstash/config"
Restart=always
WorkingDirectory=/
Nice=19
LimitNOFILE=16384

TimeoutStopSec=infinity

[Install]
WantedBy=multi-user.target

and kill the service by calling service logstash stop. According to the systemd service docu,

When the death of the process is a result of systemd operation (e.g. service stop or restart), the service will not be restarted.

So although I define Restart=always, the service shouldn't restart itself since we kill it with service logstash stop. However, it does! And it overwrites the jdbc metadata timestamp recorded from the full build, so records go missing.

Does anyone have experience with this? It's a rather specific use case, but I would appreciate any feedback.

I would be looking for things that would be managing services. eg.:

  • Pacemaker or other cluster resource managers (is crmd or pcsd running?)
  • Puppet, Chef or other configuration management tools
  • Monitoring tools (monit, nagios, etc....)
  • Cron... other background processes

PS. I'm not all too accustomed if systemd behaves diifferently if called as 'service'... but I do know that Systemd can be a little different on different systems; what type of Linux system are you running Logstash on?

Hi @cknz ,

Thanks for the ideas! I've looked into the managing services and I am pretty sure that the service is not starting due to these. After some testing, I noticed that if I stop the services manually, they don't start again until I restart them manually.

The problem appears to be that when I start a logstash instance separate from the service, the service is started. Since I've localized the problem, I created a new issue for this:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.