Multiple pipelines utilizing s3 output cause each other to terminate

In Logstash, I am using the s3 output plugin in multiple pipelines. Each plugin instance stores temporary files to disk and has the restore option set to true. I have problems with the restarting of other pipelines causing the non-restarting pipeline's temporary files to be deleted and then the pipeline terminates when the restore feature cannot find the deleted temporary file.

Disabling restore for the s3 outputs of course avoids this problem.

Additionally, when all of the pipelines' configurations are squeezed into a single pipeline, the problem does not occur (of course, there's only one pipeline that's restarting :wink: )

Is this multi-pipeline use case a known problem for the s3 output plugin? Otherwise, how can I avoid this problem from happening?

For example, Pipelines ABC and XYZ both use the s3 output plugin. When Pipeline ABC restarts, I see the temporary file for Pipeline XYZ get deleted even though the temporary file's 5-min rotation time has not yet been reached.
Logstash's log then states that as part of Pipeline ABC's restart, the "corrupted" temporary file for Pipeline XYZ has been "recovered" (in reality, it no longer exists).

(snipped)
Jun 16 18:35:38 logstashHost logstash: [2023-06-16T18:35:38,927][INFO ][logstash.pipelineaction.reload] Reloading pipeline {"pipeline.id"=>:"ABC"}
Jun 16 18:35:59 logstashHost logstash: [2023-06-16T18:35:59,877][WARN ][org.logstash.plugins.integration.outputs.s3.GzipUtil][ABC] Corrupted file recovered, path:/var/lib/logstash/s3-output-tmp/6b9aec2a-5f52-4822-b7cb-5f31a22a2d33/PIPELINE-XYZ-LOGS/2023/06/16/ls.s3.8bdd9841-2c1c-4042-bb66-1d624e838e35.2023-06-16T18.35.part1-recovered.txt
Jun 16 18:36:20 logstashHost logstash: [2023-06-16T18:36:20,860][INFO ][logstash.javapipeline    ][ABC] Pipeline started {"pipeline.id"=>"ABC"}
Jun 16 18:36:20 logstashHost logstash: [2023-06-16T18:36:20,865][INFO ][logstash.agent           ] Pipelines running {:count=>2, :running_pipelines=>[:XYZ, :"ABC"], :non_running_pipelines=>[]}

A few minutes later, Pipeline XYZ now terminates because the recovery of its s3 output temporary file fails due to its non-existence.

(snipped)
Jun 16 18:40:59 logstashHost logstash: [2023-06-16T18:40:59,641][ERROR][logstash.javapipeline    ][XYZ] Pipeline worker error, the pipeline will be stopped {:pipeline_id=>"XYZ", :error=>"(IOError) finished", :exception=>Java::OrgJrubyExceptions::IOError, :backtrace=>["org.jruby.ext.zlib.JZlibRubyGzipWriter.write(org/jruby/ext/zlib/JZlibRubyGzipWriter.java:376)", "usr.share.logstash.vendor.jruby.lib.ruby.stdlib.forwardable.write(/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/forwardable.rb:230)", "usr.share.logstash.vendor.jruby.lib.ruby.stdlib.forwardable.write(/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/forwardable.rb:230)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.multi_receive_encoded(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:248)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.get_file(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:145)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.get_factory(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:107)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.with_lock(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:27)", "org.jruby.ext.monitor.Monitor.synchronize(org/jruby/ext/monitor/Monitor.java:82)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.with_lock(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:26)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.get_factory(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:103)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.file_repository.get_file(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:145)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.multi_receive_encoded(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:248)", "org.jruby.RubyArray.each(org/jruby/RubyArray.java:1865)", "usr.share.logstash.vendor.bundle.jruby.$2_dot_6_dot_0.gems.logstash_minus_integration_minus_aws_minus_7_dot_1_dot_1_minus_java.lib.logstash.outputs.s3.multi_receive_encoded(/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:243)", "usr.share.logstash.logstash_minus_core.lib.logstash.outputs.base.multi_receive(/usr/share/logstash/logstash-core/lib/logstash/outputs/base.rb:103)", "org.logstash.config.ir.compiler.AbstractOutputDelegatorExt.multi_receive(org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121)", "RUBY.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:304)"], :thread=>"#<Thread:0x65ab9289@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 sleep>"}
Jun 16 18:41:02 logstashHost logstash: [2023-06-16T18:41:02,951][ERROR][logstash.javapipeline    ][XYZ] Pipeline error {:pipeline_id=>"XYZ", :exception=>#<Errno::ENOENT: No such file or directory - No such file or directory - /var/lib/logstash/s3-output-tmp/6b9aec2a-5f52-4822-b7cb-5f31a22a2d33/PIPELINE-XYZ-LOGS/2023/06/16/ls.s3.8bdd9841-2c1c-4042-bb66-1d624e838e35.2023-06-16T18.35.part1.txt.gz>, :backtrace=>["org/jruby/RubyFileTest.java:248:in `size'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/temporary_file.rb:50:in `size'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:369:in `upload_file'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:274:in `block in close'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:89:in `block in each_files'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:132:in `block in each_factory'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:27:in `block in with_lock'", "org/jruby/ext/monitor/Monitor.java:82:in `synchronize'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:26:in `with_lock'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:131:in `block in each_factory'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:129:in `each_factory'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3/file_repository.rb:88:in `each_files'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-integration-aws-7.1.1-java/lib/logstash/outputs/s3.rb:273:in `close'", "/usr/share/logstash/logstash-core/lib/logstash/plugin.rb:98:in `do_close'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:75:in `do_close'", "org/jruby/RubyArray.java:1865:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:483:in `shutdown_workers'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:209:in `run'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:146:in `block in start'"], "pipeline.sources"=>["/etc/logstash/conf.d/pipeline_XYZ/XYZ.conf"], :thread=>"#<Thread:0x65ab9289@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
Jun 16 18:41:02 logstashHost logstash: [2023-06-16T18:41:02,952][INFO ][logstash.javapipeline    ][XYZ] Pipeline terminated {"pipeline.id"=>"XYZ"}

Can you share your pipeline configurations?

Also, in the logs you shared the temporary directory is the same for both pipelines, did this issue still happens if you use a different tmp directory for each pipeline?

Which version of LS are you running?
Check this.

So far from my testing, @leandrojmp's suggestion to use a different tmp directory for each pipeline (specifically, a different tmp directory for each s3 output block within each pipeline -- some pipelines have configs that include multiple s3 blocks based on conditionals; a leftover from when the configs were all part of a single pipeline) seems to have resolved the issue.

The symptoms were nearly identical to those in the GitHub ticket that @Rios shared (as well as this other one), which is interesting since there are references in those tickets that this problem was supposedly fixed in earlier versions of the output plugins.

For posterity's sake, I was using the following versions:

  • Logstash 8.8.1 (rpm-based package)
  • logstash-output-s3 as bundled within logstash-integration-aws 7.1.1

My problematic configs were like the following...

Pipeline ABC:

output {
  if [type] == "ABC-logs" {
    s3 {
      region => "ap-northeast-1"
      bucket => "MyBucketName"
      prefix => "ABC-logs/%{+YYYY}/%{+MM}/%{+dd}"
      temporary_directory => "/var/lib/logstash/s3-output-tmp"
      encoding => "gzip"
      rotation_strategy => "time"
      time_file => 5
      restore => true
    }
  }
  #...one other lumberjack output snipped...
}

Pipeline XYZ:

output {
  if [type] == "FOO-logs" {
    s3 {
      region => "ap-northeast-1"
      bucket => "MyBucketName"
      prefix => "FOO-logs/%{+YYYY}/%{+MM}/%{+dd}"
      temporary_directory => "/var/lib/logstash/s3-output-tmp"
      encoding => "gzip"
      rotation_strategy => "time"
      time_file => 5
      restore => true
    }
  }
  if [type] == "BAR-logs" {
    s3 {
      region => "ap-northeast-1"
      bucket => "MyBucketName"
      prefix => "BAR-logs/%{+YYYY}/%{+MM}/%{+dd}"
      temporary_directory => "/var/lib/logstash/s3-output-tmp"
      encoding => "gzip"
      rotation_strategy => "time"
      time_file => 5
      restore => true
    }
  }
  #...one other lumberjack output snipped...
}

As mentioned above, configuring each s3 block to use a unique temporary_directory seems to avoid the problem.

The code is blowing up here. The outputs are shutting down, and each one is iterating over the files in the temporary directory to upload them to s3. I find it entirely unsurprising that there is a race condition where two outputs try to delete the same file and one of them fails.

Plus it is a documented requirement for each output to have its own temporary directory. See the second "Important!" in this section of the documentation.

1 Like

Many thanks for pointing both of these out!

Plus it is a documented requirement for each output to have its own temporary directory. See the second "Important!" in this section of the documentation.

<facepalm>
Re-Note to Self (nth time): when troubleshooting, Reading The Fine Manual again is more efficient -- and enlightening -- than bashing your head against the wall...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.