Logstash Poller Error - Concurrent:TimeoutError - logstash-5.0.0-beta1

Getting this error in my logstash-plain.log. There doesn't seem to be an exact frequency from what I can tell - or exact replication steps. It is not affecting messages going the output filters. This logstash is installed from logstash-5.0.0-beta1.rpm. Interestingly, the error seems to not occur when I change the log level from default/info to debug. Possible metric API issue? Or maybe something that just needs some more precise error handling to better identify the issue? Unless of course I am missing something obvious. :slight_smile:

[2016-10-06T22:59:02,450][ERROR][logstash.instrument.periodicpoller.jvm] PeriodicPoller: exception {:poller=>#<LogStash::Instrument::PeriodicPoller::JVM:0x1724e9f5 @task=#<Concurrent::TimerTask:0x554897af @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x3e394ed0 @observers={#<LogStash::Instrument::PeriodicPoller::JVM:0x1724e9f5 ...>=>:update}>, @timeout_interval=60.0, @running=#<Concurrent::AtomicBoolean:0x5fa80d2a>, @StoppedEvent=#<Concurrent::Event:0x4e4c1c9f @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x3a779448 @task=#<Proc:0x19547770@/usr/share/logstash/logstash-core/lib/logstash/instrument/periodic_poller/base.rb:52>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x68af82d @set=false, @iteration=0>, @value=nil, @copy_on_deref=nil, @dup_on_deref=nil>, @peak_threads=36, @peak_open_fds=260, @metric=#<LogStash::Instrument::Metric:0x280ca487 @collector=#<LogStash::Instrument::Collector:0x4e5cd56 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x2e5fb01a @store=#<Concurrent::Map:0x148aad56 @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x4510747a>, @fast_lookup=#<Concurrent::Map:0x8f5d8c3 @default_proc=nil>>, @observer_state=false, @snapshot_task=#<Concurrent::TimerTask:0x5bf966a7 @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x1a588093 @observers={#<LogStash::Instrument::Collector:0x4e5cd56 ...>=>:update}>, @timeout_interval=600.0, @running=#<Concurrent::AtomicBoolean:0x7a386c90>, @StoppedEvent=#<Concurrent::Event:0x6ae4f12e @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x34e6b8f9 @task=#<Proc:0x4db87d7@/usr/share/logstash/logstash-core/lib/logstash/instrument/collector.rb:87>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x3d3e2730 @set=false, @iteration=0>, @value=false, @copy_on_deref=nil, @dup_on_deref=nil>>>, @options={:polling_interval=>1, :polling_timeout=>60}>, :result=>nil, :exception=>#<Concurrent::TimeoutError: Concurrent::TimeoutError>, :executed_at=>2016-10-06 22:59:02 -0400}

1 Like

What does your config look like?

[root@logs-01 logstash]# cat logstash.yml | grep -vE '^#'
path.data: /var/lib/logstash
path.config: /etc/logstash/conf.d
path.logs: /var/log/logstash
log.level: info

[root@logs-01 logstash]# cat jvm.options  | grep -vE '^#'
-Xms3g
-Xmx3g
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-XX:+HeapDumpOnOutOfMemoryError

[root@logs-01 logstash]# cat startup.options  | grep -vE '^#'
JAVACMD=/usr/bin/java
LS_HOME=/usr/share/logstash
LS_SETTINGS_DIR=/etc/logstash
LS_OPTS="--path.settings ${LS_SETTINGS_DIR}"
LS_JAVA_OPTS=""
LS_PIDFILE=/var/run/logstash.pid
LS_USER=logstash
LS_GROUP=logstash
LS_GC_LOG_FILE=/var/log/logstash/gc.log
LS_OPEN_FILES=16384
LS_NICE=19
SERVICE_NAME="logstash"
SERVICE_DESCRIPTION="logstash"

[root@logs-01 logstash]# cat log4j2.properties | grep -vE '^#'
status = error
name = LogstashPropertiesConfig
appender.rolling.type = RollingFile
appender.rolling.name = plain_rolling
appender.rolling.fileName = ${sys:ls.logs}/logstash-${sys:ls.log.format}.log
appender.rolling.filePattern = ${sys:ls.logs}/logstash-${sys:ls.log.format}-%d{yyyy-MM-dd}.log
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %.10000m%n
appender.json_rolling.type = RollingFile
appender.json_rolling.name = json_rolling
appender.json_rolling.fileName = ${sys:ls.logs}/logstash-${sys:ls.log.format}.log
appender.json_rolling.filePattern = ${sys:ls.logs}/logstash-${sys:ls.log.format}-%d{yyyy-MM-dd}.log
appender.json_rolling.policies.type = Policies
appender.json_rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.json_rolling.policies.time.interval = 1
appender.json_rolling.policies.time.modulate = true
appender.json_rolling.layout.type = JSONLayout
appender.json_rolling.layout.compact = true
appender.json_rolling.layout.eventEol = true
rootLogger.level = ${sys:ls.log.level}
rootLogger.appenderRef.rolling.ref = ${sys:ls.log.format}_rolling

[root@logs-01 logstash]# cat patterns/* | grep -vE '^#'
ACTION [a-zA-Z0-9\-_]+
SERVICE_TICKET ST\-.*\-%{HOSTNAME}
TGT TGT\-.*\-%{HOSTNAME}
CONSUMER %{URIPROTO}://%{URIHOST}%{URIPATH}
PARTIAL_SENTENCE .+[^\:\.\;]
JAVA_PROCESS [a-zA-Z0-9\.\-_]+
LOG4J2_GREEDY .+[^\|]
SHIBBOLETH_PROCESS [a-zA-Z0-9\-\._]+
TIMESTAMP_SHIBBOLETH %{YEAR}%{MONTHNUM}%{MONTHDAY}T%{HOUR}%{MINUTE}%{SECOND}Z
URN urn:[0-9A-Za-z][0-9A-Za-z-]{0,31}:(?:%[0-9a-fA-F]{2}|[0-9A-Za-z()+,.:=@;$_!*'/?#-])+
SESSION [a-zA-Z0-9\-_]+
SHIBBOLETH_ATTRIBUTES [a-zA-Z0-9\-_\,]+
SHIB_URN (%{URN}|%{CONSUMER})
USERNAME_UNKNOWN [a-zA-Z0-9\-_:]+
[root@logs-01 logstash]# cat conf.d/* | grep -vE '^#'
input {
  beats {
    port            => 5044
    type            => "beats"
  }
}
input {
  beats {
    type => "json"
    codec => "json"
    port => 5045
  }
}

input {
  beats {
    type => "tomcat"
    port => 5046
  }
}

input {
  beats {
    type => "shibboleth"
    port => 5047
  }
}
filter {
    if ("apache" in [document_type] and "{" in [message] and "}" in [message]) {
      json{
        source => "message"
      }
      date {
        match => [ "timeMillis", "UNIX_MS" ]
      }
    }
}
filter {
    if ( [type] == "cas" ) {
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{DATESTAMP_OTHER:timestamp}\|%{WORD:service}\|%{GREEDYDATA:message}"]
        overwrite => ["message"]
      }
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{SERVICE_TICKET:service_ticket} for %{CONSUMER:consumer}(?:%{URIPARAM})?\|%{ACTION:action}\|%{USERNAME:sso_username}\|%{IPORHOST:client_ip}\|%{HOSTNAME:instance}"]
      }
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{SERVICE_TICKET:service_ticket}\|%{ACTION:action}\|%{USERNAME:sso_username}\|%{IPORHOST:client_ip}\|%{HOSTNAME:instance}"]
      }
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{TGT:ticket_granting_ticket}\|%{ACTION:action}\|%{USERNAME_UNKNOWN:sso_username}\|%{IPORHOST:client_ip}\|%{HOSTNAME:instance}"]
      }
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "Supplied credentials:.*Issuer=%{CONSUMER:provider},Audience.*Issued On=%{TIMESTAMP_ISO8601:provider_session_created_on}.*,Valid Before=%{TIMESTAMP_ISO8601:provider_session_expires},.*UPN=\[%{USERNAME:sso_username}\].*\}\]\]\|%{ACTION:action}.*\}\]\|%{IPORHOST:client_ip}\|%{HOSTNAME:instance}"]
      }
    }
}
filter {
    if ( [type] == "tomcat" ) {
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{TIMESTAMP_ISO8601:timestamp} \| %{LOGLEVEL:log_level} \| %{WORD:service} \| %{JAVA_PROCESS:tomcat_process} \- %{GREEDYDATA:message}"]
        overwrite => [ "message" ]
      }
    }
}
filter {
    if ( [type] == "shibboleth" ) {
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{TIMESTAMP_ISO8601:idp_timestamp} \- %{LOGLEVEL:log_level} \[%{JAVA_PROCESS:tomcat_process}:%{INT}\] \- %{GREEDYDATA:message}"]
        overwrite => [ "message" ]
      }
      grok {
        patterns_dir => ["/etc/logstash/patterns"]
        match => ["message", "%{TIMESTAMP_SHIBBOLETH:shib_timestamp}\|%{URN:urn}\|%{SESSION:shibboleth_session}\|%{CONSUMER:consumer}\|%{SHIB_URN}\|%{CONSUMER:provider}\|%{URN:urn}\|%{SESSION:shibboleth_session}\|%{USERNAME:sso_username}\|%{URN:urn}\|%{SHIBBOLETH_ATTRIBUTES:shib_attributes}\|"]
        match => ["message", "Profile Action %{WORD:shibboleth_action}: %{PARTIAL_SENTENCE:shibboleth_action_result}: %{USERNAME:sso_username}"]
        match => ["message", "%{TIMESTAMP_SHIBBOLETH:shib_timestamp}\|%{CONSUMER:consumer}\|%{WORD:shibboleth_action}\|%{USERNAME:sso_username}\|"]
      }
      if ( [shib_attributes] and "," in [shib_attributes] ) {
        mutate {
          split => { "shib_attributes" => "," }
        }
      }
      if ( [idp_timestamp] ) {
        date {
          match => [ "idp_timestamp", "yyyy-MM-dd HH:mm:ss,SSS"]
        }
      }
    }
}
filter {
    if ( [client_ip] ) {
      geoip {
        source => "client_ip"
        database => "/etc/logstash/geoip/GeoLite2-City.mmdb"
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
      }
      mutate {
        convert => [ "[geoip][coordinates]", "float"]
      }
    }
}
output {
  elasticsearch {
    hosts => ["10.200.235.227:9200"]
    sniffing => false
    manage_template => false
    index => "sso-%{[fields][data_type]}-%{+YYYY.MM.dd}"
  }
}

Confirming seeing this as well:

[2016-10-13T14:43:23,197][ERROR][logstash.instrument.periodicpoller.jvm] PeriodicPoller: exception {:poller=>#<LogStash::Instrument::PeriodicPoller::JVM:0x629951d9 @task=#<Concurrent::TimerTask:0x754b8b32 @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x6e6221f0 @observers={#<LogStash::Instrument::PeriodicPoller::JVM:0x629951d9 ...>=>:update}>, @timeout_interval=60.0, @running=#<Concurrent::AtomicBoolean:0x5f0bc8bb>, @StoppedEvent=#<Concurrent::Event:0x76b0ca73 @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x9a218f4 @task=#<Proc:0x565b36ca@/usr/share/logstash/logstash-core/lib/logstash/instrument/periodic_poller/base.rb:52>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x571462aa @set=false, @iteration=0>, @value=nil, @copy_on_deref=nil, @dup_on_deref=nil>, @peak_threads=30, @peak_open_fds=65, @metric=#<LogStash::Instrument::Metric:0x2787d969 @collector=#<LogStash::Instrument::Collector:0x6cb537ea @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x3e7b00d5 @store=#<Concurrent::Map:0x31199cb6 @default_proc=nil>, @structured_lookup_mutex=#<Mutex:0x45eca90d>, @fast_lookup=#<Concurrent::Map:0x5cb96c48 @default_proc=nil>>, @observer_state=true, @snapshot_task=#<Concurrent::TimerTask:0x3aa2a2a2 @observers=#<Concurrent::Collection::CopyOnNotifyObserverSet:0x47a3656c @observers={#<LogStash::Instrument::Collector:0x6cb537ea ...>=>:update}>, @timeout_interval=600.0, @running=#<Concurrent::AtomicBoolean:0x706be3c4>, @StoppedEvent=#<Concurrent::Event:0x5f73f8d9 @set=false, @iteration=0>, @execution_interval=1.0, @do_nothing_on_deref=true, @run_now=nil, @freeze_on_deref=nil, @executor=#<Concurrent::SafeTaskExecutor:0x3a77e5e @task=#<Proc:0x5c9d33d6@/usr/share/logstash/logstash-core/lib/logstash/instrument/collector.rb:87>, @exception_class=StandardError>, @StopEvent=#<Concurrent::Event:0x73479e42 @set=false, @iteration=0>, @value=false, @copy_on_deref=nil, @dup_on_deref=nil>>>, @options={:polling_interval=>1, :polling_timeout=>60}>, :result=>nil, :exception=>#<Concurrent::TimeoutError: Concurrent::TimeoutError>, :executed_at=>2016-10-13 14:43:23 -0600}

Confirming that I am still seeing this in rc1

I'm seeing this with the LS 5.0 release version too.

Were you able to resolve it?

Thanks,
Sanjiv

I had the error through RC1 - however, I have not seen it yet since upgrading to 5.0.0 GA.

When I have a small number of file inputs I don't see this but when the number of file inputs are >50 I start seeing this. I'll log a separate bug report for this.

Thanks,
Sanjiv

That could be something.....

For me it wasn't dependent on number of inputs but rather number of events of messages per second - which could equate to your multiple inputs depending on data flow.

That's good to know. I'll keep that in mind when trying to further troubleshoot by pointing to files that don't have much logging.

Can anyone explain what this logstash.instrument.periodicpoller.jvm Periodic Poller is doing? Is it safe to disable and if so how? It doesn't appear to be related to the functionality of processing the input but rather related to JVM metrics collection.

I might have to set the log level of logstash.instrument.periodicpoller.jvm set to FATAL to avoid seeing this noise.

Thanks,
Sanjiv

The poller is gathering system stats for the jvm/os and this mean it took too much time to complete the operation.
Some system call takes more times or have a lower priority so they can cause the poller to timeout.

I've created 2 issues related to that:


The fix are easy to do, I think for now you can hide them using the log4j settings.

Thank you for the explanation. Makes me feel better getting confirmation that it's okay to hime them using a log4j setting. We we know the overhead of the JVM stat collector? Would being able to disable it via a setting help improve the performance of a heavily loaded LS instance? I ran LS 2.2.2 with the same config as LS 5.0 and a change made to a log file was picked up immediately by the LS 2.2.2 file filter but for 12-20s on LS 5.0 and during that period I saw a bunch of these timeout errors. Not sure if it's related to the slowdown of the file notification in LS 5.0, or unrelated.

Was this resolved ?

I am also facing exactly the same issue.