Logstash 2.3.2 hung with no useful logging

Hi all,

I was tuning my logstash instance working on

  • pipeline-workers,
  • pipeline-batch-size,
  • elasticsearch output workers.

I run logstash several times over the days, every time killing -9 it when I collected enough performance metrics. However, the last time I tried to run it after changing elasticsearch output workers from 5 to 6 and Logstash got hung.
Actually it is running, it is populating logstash.log, but it is not processing events. Even if I kill it and try to start again, or rollback to the previous working configuration, will not solve the issue.

To better understand the issue I raised the log to debug level and captured an strace. However I am not able to fully understand what is happening, can you help me?

Thanks a lot,
Daniele

logstash logging keeps repeating the following lines:

{:timestamp=>"2016-05-25T10:57:20.155000+0200", :message=>"Flushing buffer at interval", :instance=>"#<LogStash::Outputs::ElasticSearch::Buffer:0x6129873 @operations_mutex=#<Mutex:0x8d566c1>, @max_size=500, @operations_lock=#<Java::JavaUtilConcurrentLocks::ReentrantLock:0xceaf303>, @submit_proc=#<Proc:0x5baba4b4@/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.6.2-java/lib/logstash/outputs/elasticsearch/common.rb:57>, @logger=#<Cabin::Channel:0x786b226e @metrics=#<Cabin::Metrics:0x23c2b48b @metrics_lock=#<Mutex:0x29c793b5>, @metrics={}, @channel=#<Cabin::Channel:0x786b226e ...>>, @subscriber_lock=#<Mutex:0x2f4f1c3d>, @level=:debug, @subscribers={12528=>#<Cabin::Subscriber:0x38c6b84d @output=#<Cabin::Outputs::IO:0x2d89cedb @io=#<File:/var/log/logstash/logstash-debug-goretti.log>, @lock=#<Mutex:0x7a7081a3>>, @options={}>}, @data={}>, @last_flush=2016-05-25 10:57:19 +0200, @flush_interval=1, @stopping=#<Concurrent::AtomicBoolean:0x45466476>, @buffer=[], @flush_thread=#<Thread:0x48cfb843 run>>", :interval=>1, :level=>:debug, :file=>"logstash/outputs/elasticsearch/buffer.rb", :line=>"90", :method=>"interval_flush"}
{:timestamp=>"2016-05-25T10:57:20.681000+0200", :message=>"Pushing flush onto pipeline", :level=>:debug, :file=>"logstash/pipeline.rb", :line=>"458", :method=>"flush"}
{:timestamp=>"2016-05-25T10:57:20.684000+0200", :message=>"Flushing", :plugin=><LogStash::Filters::Metrics meter=>["events"], add_tag=>["metric"], flush_interval=>10, periodic_flush=>false, ignore_older_than=>0, clear_interval=>-1, rates=>[1, 5, 15], percentiles=>[1, 5, 10, 90, 95, 99, 100]>, :level=>:debug, :file=>"(eval)", :line=>"28", :method=>"initialize"}
{:timestamp=>"2016-05-25T10:57:21.160000+0200", :message=>"Flushing buffer at interval", :instance=>"#<LogStash::Outputs::ElasticSearch::Buffer:0x6129873 @operations_mutex=#<Mutex:0x8d566c1>, @max_size=500, @operations_lock=#<Java::JavaUtilConcurrentLocks::ReentrantLock:0xceaf303>, @submit_proc=#<Proc:0x5baba4b4@/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-2.6.2-java/lib/logstash/outputs/elasticsearch/common.rb:57>, @logger=#<Cabin::Channel:0x786b226e @metrics=#<Cabin::Metrics:0x23c2b48b @metrics_lock=#<Mutex:0x29c793b5>, @metrics={}, @channel=#<Cabin::Channel:0x786b226e ...>>, @subscriber_lock=#<Mutex:0x2f4f1c3d>, @level=:debug, @subscribers={12528=>#<Cabin::Subscriber:0x38c6b84d @output=#<Cabin::Outputs::IO:0x2d89cedb @io=#<File:/var/log/logstash/logstash-debug-goretti.log>, @lock=#<Mutex:0x7a7081a3>>, @options={}>}, @data={}>, @last_flush=2016-05-25 10:57:20 +0200, @flush_interval=1, @stopping=#<Concurrent::AtomicBoolean:0x45466476>, @buffer=[], @flush_thread=#<Thread:0x48cfb843 run>>", :interval=>1, :level=>:debug, :file=>"logstash/outputs/elasticsearch/buffer.rb", :line=>"90", :method=>"interval_flush"}

strace -f keeps repeating the following lines

25041 futex(0x7f105002ca54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1041214, 386230534}, ffffffff <unfinished ...>
25044 ioctl(5, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7f10b19d8e90) = -1 ENOTTY (Inappropriate ioctl for device)
25044 write(5, "{:timestamp=>\"2016-05-25T10:57:1"..., 287) = 287
25044 stat("/home/user/ls-provaperformance/standard-query.log", {st_mode=S_IFREG|0644, st_size=3912294291, ...}) = 0
25044 futex(0x7f1050035154, FUTEX_WAIT_BITSET_PRIVATE, 1, {1041214, 386940726}, ffffffff <unfinished ...>
25070 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
25070 futex(0x7f105025fc28, FUTEX_WAKE_PRIVATE, 1) = 0
25070 futex(0x7f105025fc54, FUTEX_WAIT_BITSET_PRIVATE, 1, {1041213, 496346755}, ffffffff <unfinished ...>
25036 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
25036 futex(0x7f112c1e9728, FUTEX_WAKE_PRIVATE, 1) = 0
25036 futex(0x7f112c1e9754, FUTEX_WAIT_BITSET_PRIVATE, 1, {1041213, 472205457}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
25036 futex(0x7f112c1e9728, FUTEX_WAKE_PRIVATE, 1) = 0
25036 futex(0x7f112c1e9754, FUTEX_WAIT_BITSET_PRIVATE, 1, {1041213, 522501456}, ffffffff <unfinished ...>

What's your config look like?

Here it is:

input {
  # Log dei Check Point
  file {
    path => "/home/user/ls-provaperformance/standard-pcom-be.log"
    type => "checkpoint"
    start_position  => "beginning"
    sincedb_path => "/dev/null"
  }
  file {
    path => "/home/user/ls-provaperformance/standard-pcom-fe.log"
    type => "checkpoint"
    start_position  => "beginning"
    sincedb_path => "/dev/null"
  }
  file {
    path => "/home/user/ls-provaperformance/standard-pcom-pi.log"
    type => "checkpoint"
    start_position  => "beginning"
    sincedb_path => "/dev/null"
  }
  file {
    path => "/home/user/ls-provaperformance/standard-query.log"
    type => "dns"
    start_position  => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {
  metrics {
    meter => "events"
    add_tag => "metric"
    flush_interval => 10
  }
}
output {
  if "metric" in [tags] {
    file {
      path => "/var/log/logstash/event-rate.log"
      codec => line {
        format => "%{[@timestamp]} --> 1m rate: %{[events][rate_1m]} - 5m rate: %{[events][rate_5m]} - 15m rate: %{[events][rate_15m]}"
      }
    }
  }
  elasticsearch {
    hosts => [ "192.168.78.25","192.168.78.35","192.168.78.36","192.168.78.37","192.168.78.38","192.168.78.39" ]
    index => "provaperformance"
#    workers => 5
  }
}

note that the workers has been commented out in the latest runs, the issue is still there though

shame on me, it's really troublesome to be so uncompetent :sweat_smile:

just a word.... ignore_older

argh