Logstash upgrade issue - 8.11.3 version

Hello Team,
Good evening!

Today I have upgraded the Logstash from version 8.10.4 to 8.11.3 version. After the upgrade the Logstash is keep restarting and throwing below errors.

This type of FATAL error is coming for all pipelines. Not able to identify the exact problem to fix this.

due to this issue, the data is not processing and sending to Kafka system.

{"level":"FATAL","loggerName":"org.logstash.Logstash","timeMillis":1703864917361,"thread":"[l4a-logstash]>worker0","logEvent":{"message":"uncaught error (in thread [1-logstash]>worker0)"}}

note: In the older version the logstash was working fine as expected.

Could you please someone help me on this?

Thanks
Siva

You will probably have other log lines before that, please share the other log lines as they may have some useful hint about the source issue.

  • Can you provide more log lines from logstash-plain.log? Pls enable debug or trace mode and provide more details.
  • How many pipelines is there?
  • Have you checked journal?
  • Have you try to move all .conf and start with a basic .conf with simple input,filter,output?
  • Are you using the ssl communication with Kafka? Check certs and rights
  • Have you used rpm upgrade or the clean install?

thanks for your response!
I could see mostly info message. whenever the logstash is stating that FATAL error, then immediately it's keep getting restarting.

Find the error at the bottom.

{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875898020,"thread":"[l-es]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":22.51}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875898049,"thread":"[l-es]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-es"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875900536,"thread":"[l-ttts]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":24.12}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875900671,"thread":"[l-ttts]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-ttts"}}
{"level":"INFO","loggerName":"logstash.outputs.file","timeMillis":1703875904948,"thread":"[selfmonitor]>worker0","logEvent":{"message":"Opening file","path":"/var/log/logstash/ls-error-2023-12-29.log"}}
{"level":"INFO","loggerName":"logstash.outputs.file","timeMillis":1703875906332,"thread":"[selfmonitor]>worker0","logEvent":{"message":"Closing file /var/log/logstash/ls-exec-2023-12-29.log"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875911670,"thread":"[l-monitor]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":34.29}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875912832,"thread":"[l-monitor]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-monitor"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875917224,"thread":"[l-prodirct]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":30.27}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875918601,"thread":"[l-prodirct]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-prodirct"}}
{"level":"INFO","loggerName":"logstash.outputs.file","timeMillis":1703875921399,"thread":"[selfmonitor]>worker0","logEvent":{"message":"Closing file /var/log/logstash/ls-error-2023-12-29.log"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875926346,"thread":"[l-service]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":39.87}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875926517,"thread":"[l-service]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-service"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875931615,"thread":"Converge PipelineAction::Create<l-logstash>","logEvent":{"message":"Pipeline `l-logstash` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise."}}
{"level":"WARN","loggerName":"org.logstash.instrument.metrics.gauge.LazyDelegatingGauge","timeMillis":1703875931960,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"A gauge metric of an unknown type
(org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team."}}
{"level":"INFO","loggerName":"org.apache.kafka.clients.producer.ProducerConfig","timeMillis":1703875932611,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Idempotence will be disabled because acks is set to 1, not set to 'all'."}}
{"level":"INFO","loggerName":"org.apache.kafka.clients.producer.ProducerConfig","timeMillis":1703875932611,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"ProducerConfig values: \n\tacks = 1\n\tauto.include.jmx.reporter = true\n\tbatch.size = 16384\n\tbootstrap.servers = [br101-hulk-acc.ic.ing.net:9093, br102-hulk-acc.ic.ing.net:9093, br201-hulk-acc.ic.ing.net:9093, br202-hulk-acc.ic.ing.net:9093, br301-hulk-acc.ic.ing.net:9093, br302-hulk-acc.ic.ing.net:9093, br401-hulk-acc.ic.ing.net:9093, br402-hulk-acc.ic.ing.net:9093]\n\tbuffer.memory = 33554432\n\tclient.dns.lookup = use_all_dns_ips\n\tclient.id = producer-18\n\tcompression.type = none\n\tconnections.max.idle.ms = 540000\n\tdelivery.timeout.ms = 120000\n\tenable.idempotence = false\n\tinterceptor.classes = []\n\tkey.serializer = class org.apache.kafka.common.serialization.StringSerializer\n\tlinger.ms = 0\n\tmax.block.ms = 60000\n\tmax.in.flight.requests.per.connection = 5\n\tmax.request.size = 1048576\n\tmetadata.max.age.ms = 300000\n\tmetadata.max.idle.ms = 300000\n\tmetric.reporters = []\n\tmetrics.num.samples = 2\n\tmetrics.recording.level = INFO\n\tmetrics.sample.window.ms = 30000\n\tpartitioner.adaptive.partitioning.enable = true\n\tpartitioner.availability.timeout.ms = 0\n\tpartitioner.class = null\n\tpartitioner.ignore.keys = false\n\treceive.buffer.bytes = 32768\n\treconnect.backoff.max.ms = 50\n\treconnect.backoff.ms = 50\n\trequest.timeout.ms = 40000\n\tretries = 2147483647\n\tretry.backoff.ms = 100\n\tsasl.client.callback.handler.class = null\n\tsasl.jaas.config = null\n\tsasl.kerberos.kinit.cmd = /usr/bin/kinit\n\tsasl.kerberos.min.time.before.relogin = 60000\n\tsasl.kerberos.service.name = null\n\tsasl.kerberos.ticket.renew.jitter = 0.05\n\tsasl.kerberos.ticket.renew.window.factor = 0.8\n\tsasl.login.callback.handler.class = null\n\tsasl.login.class = null\n\tsasl.login.connect.timeout.ms = null\n\tsasl.login.read.timeout.ms = null\n\tsasl.login.refresh.buffer.seconds = 300\n\tsasl.login.refresh.min.period.seconds = 60\n\tsasl.login.refresh.window.factor = 0.8\n\tsasl.login.refresh.window.jitter = 0.05\n\tsasl.login.retry.backoff.max.ms = 10000\n\tsasl.login.retry.backoff.ms = 100\n\tsasl.mechanism = GSSAPI\n\tsasl.oauthbearer.clock.skew.seconds = 30\n\tsasl.oauthbearer.expected.audience = null\n\tsasl.oauthbearer.expected.issuer = null\n\tsasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000\n\tsasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000\n\tsasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100\n\tsasl.oauthbearer.jwks.endpoint.url = null\n\tsasl.oauthbearer.scope.claim.name = scope\n\tsasl.oauthbearer.sub.claim.name = sub\n\tsasl.oauthbearer.token.endpoint.url = null\n\tsecurity.protocol = SSL\n\tsecurity.providers = null\n\tsend.buffer.bytes = 131072\n\tsocket.connection.setup.timeout.max.ms = 30000\n\tsocket.connection.setup.timeout.ms = 10000\n\tssl.cipher.suites = null\n\tssl.enabled.protocols = [TLSv1.2, TLSv1.3]\n\tssl.endpoint.identification.algorithm = https\n\tssl.engine.factory.class = null\n\tssl.key.password = [hidden]\n\tssl.keymanager.algorithm = SunX509\n\tssl.keystore.certificate.chain = null\n\tssl.keystore.key = null\n\tssl.keystore.location = /var/opt/elkaas/logstash/../certs/private/keystore.jks\n\tssl.keystore.password = [hidden]\n\tssl.keystore.type = JKS\n\tssl.protocol = TLSv1.3\n\tssl.provider = null\n\tssl.secure.random.implementation = null\n\tssl.trustmanager.algorithm = PKIX\n\tssl.truststore.certificates = null\n\tssl.truststore.location = /var/opt/elkaas/logstash/../certs/truststore.jks\n\tssl.truststore.password = [hidden]\n\tssl.truststore.type = JKS\n\ttransaction.timeout.ms = 60000\n\ttransactional.id = null\n\tvalue.serializer = class org.apache.kafka.common.serialization.StringSerializer\n"}}
{"level":"INFO","loggerName":"org.apache.kafka.common.utils.AppInfoParser","timeMillis":1703875932813,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Kafka version: 3.4.1"}}
{"level":"INFO","loggerName":"org.apache.kafka.common.utils.AppInfoParser","timeMillis":1703875932813,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Kafka commitId: 8a516edc2755df89"}}
{"level":"INFO","loggerName":"org.apache.kafka.common.utils.AppInfoParser","timeMillis":1703875932813,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Kafka startTimeMs: 1703875932812"}}
{"level":"INFO","loggerName":"org.apache.kafka.clients.Metadata","timeMillis":1703875932967,"thread":"kafka-producer-network-thread | producer-18","logEvent":{"message":"[Producer clientId=producer-18] Cluster ID: C3buFw15TMyxlUNS-Y7iNQ"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875933657,"thread":"[l-system]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":53.46}}
{"level":"WARN","loggerName":"logstash.javapipeline","timeMillis":1703875933738,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875934368,"thread":"[l-system]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-system"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875934473,"thread":"Converge PipelineAction::Create<distributor>","logEvent":{"message":"Pipeline `distributor` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise."}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875934560,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Starting pipeline","pipeline_id":"l-logstash","pipeline.workers":1,"pipeline.batch.size":125,"pipeline.batch.delay":50,"pipeline.max_inflight":125,"pipeline.sources":["/var/opt/elkaas/logstash/pipelines/l-logstash.conf"],"thread":"#<Thread:0x3cdc6e28 /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}}
{"level":"WARN","loggerName":"org.logstash.instrument.metrics.gauge.LazyDelegatingGauge","timeMillis":1703875934918,"thread":"[distributor]-pipeline-manager","logEvent":{"message":"A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been created for key: send_to. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team."}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935071,"thread":"[distributor]-pipeline-manager","logEvent":{"message":"Starting pipeline","pipeline_id":"distributor","pipeline.workers":16,"pipeline.batch.size":125,"pipeline.batch.delay":50,"pipeline.max_inflight":2000,"pipeline.sources":["/var/opt/elkaas/logstash/pipelines/distributor.conf"],"thread":"#<Thread:0xccfa465 /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935222,"thread":"[l-phost]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":38.6}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935330,"thread":"[l-phost]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-phost"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935600,"thread":"[distributor]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":0.5}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935601,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Pipeline Java execution initialization time","seconds":0.89}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875935664,"thread":"[l-logstash]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"l-logstash"}}
{"level":"WARN","loggerName":"logstash.javapipeline","timeMillis":1703875935991,"thread":"[l-robt]-pipeline-manager","logEvent":{"message":"'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary"}}
{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1703875936075,"thread":"[l-robt]-pipeline-manager","logEvent":{"message":"Starting pipeline","pipeline_id":"l-robt","pipeline.workers":1,"pipeline.batch.size":125,"pipeline.batch.delay":50,"pipeline.max_inflight":125,"pipeline.sources":["/var/opt/elkaas/logstash/pipelines/l-robt.conf"],"thread":"#<Thread:0x46736e5a /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}}
{"level":"WARN","loggerName":"org.logstash.plugins.pipeline.PipelineBus","timeMillis":1703875936194,"thread":"[distributor]>worker13","logEvent":{"message":"Attempted to send event to 'l4a-distributor' but that address was unavailable. Maybe the destination pipeline is down or stopping? Will Retry."}}
{"level":"FATAL","loggerName":"org.logstash.Logstash","timeMillis":1703875936246,"thread":"[l-logstash]>worker0","logEvent":{"message":"uncaught error (in thread [l-logstash]>worker0)"}}

Note: Recently the below part of code I have added in l-logstash pipeline config but not sure whether this is creating that FATAL.

    #A check is added to handle the none log level events
    #if !([log][level]){
    #  mutate {            
    #    add_field => { "[log][level]"     => "INFO"  }
    #  } 
    #}
    if (([log][level] =~ /(?i)ERROR/) or ([log][level] =~ /(?i)FATAL/)){
      mutate {
        add_field => { "error_count" => "1" }
        add_field => { "warn_count" => "0" }			
      }
      mutate {
        replace => { "[log][level]" => "ERROR" }			
      }	  
    }
    if (([log][level] =~ /(?i)WARN/) or ([log][level] =~ /(?i)WARNING/)) {
      mutate {
        add_field => { "error_count" => "0" }
        add_field => { "warn_count" => "1" }			
      }
    }	
	mutate {
      convert => { "error_count" => "float" }
	  convert => { "warn_count" => "float" }
	}	
######################################################################
##The below check is added to update environment values as per NextGen logging

    if (([log_environment] =~ /(?i)P/) or ([log_environment] =~ /(?i)PRD/) or ([log_environment] =~ /(?i)PROD/)  or ([log_environment] =~ /(?i)PRODUCTION/)){
      mutate { 
	    replace => { "[log_environment]" => "prd"  }
      }	   
	}
	else if (([log_environment] =~ /(?i)A/) or ([log_environment] =~ /(?i)ACC/) or ([log_environment] =~ /(?i)ACCEPTANCE/)){
      mutate { 
	    replace => { "[log_environment]" => "acc"  }
      }		
	}
	else if (([log_environment] =~ /(?i)T/) or ([log_environment] =~ /(?i)TST/) or ([log_environment] =~ /(?i)TEST/)){
      mutate { 
	    replace => { "[log_environment]" => "tst"  }
      }		
	}
	else {
      mutate { 
	    replace => { "[log_environment]" => "dev"  }
      }	
	}

thanks
Siva

I don't think it's an issue in the filter, check inputs and outputs. Check l4a-distributor and l-logstash

I assume that you are sending events from the main conf to other pipelines. Check do you have correct the address or sending from a wrong output.

input {
  pipeline {
    address => ....

There must be a pipeline that defines an l4a-distributor pipeline input, if there wasn't then logstash would have failed to start. Which pipeline is it and what messages does that pipeline log?

@Rios @Badger

Acutally, l4a-distributor is the one of config which transfer the events to the specific Logstash pipeline address based on the event module type. Similarly, there is no more new config modification in l-logstash.

And, logstash pipeline address is correct and there is no change in the cofing.

input {	pipeline { address => "l4a-distributor" } }

filter {
# put here statements that must be executed

# event is created by filebeat.

	# save fields we need

	mutate {
		add_field => {
			"[metadata][agent][version]" => "%{[agent][version]}" 
			"[metadata][agent][type]" => "%{[agent][type]}" 
			"[metadata][host][name]" => "%{[host][name]}" 
			"[metadata][host][hostname]" => "%{[host][name]}" 
			"[metadata][asmls][environment]" => "%{[asmls][environment]}" 
			"[event_module]" => "%{[event][module]}" 
			"[event_dataset]" => "%{[event][dataset]}" 
			"[metadata][log][file][path]" => "%{[log][file][path]}" 
			"[metadata][log][offset]" => "%{[log][offset]}" 
			"[metadata][lls]" => "%{[asmls][lls][hostname]}" 
		} 
	}
	if [asmls][role] {
		mutate { add_field => { "[metadata][role]" => "%{[asmls][role]}" } }
	}
	if [asmls][phenv] {
		mutate { add_field => { "[metadata][phenv]" => "%{[asmls][phenv]}" } }
	}
	
	# Delete all unwanted fields
	# do not remove @timestamp. because some logs do not supply timestamp!
	mutate { 
		remove_field => [
			"[@version]",
			"asmls",
			"agent",
			"data_stream",
			"ecs",
			"event",
			"host",
			"input",
			"log"
		]
	}
	
	mutate { rename => { 
		"[event_module]" => "[event][module]" 
		"[event_dataset]" => "[event][dataset]" 
	} } 
##Removed the default Log Level [log][level] => INFO from the below mutate. Since addded the Log Level Unified Schema check for per modules 
	mutate {
		add_field => { 
			"log_domain"       => "cio" 
			"log_type"         => "application" 
			"log_component"    => "pcb_i" 
			"log_environment"  => "%{[metadata][asmls][environment]}" 
			"log_shipper"      => "%{[metadata][host][hostname]}" 
			"log_version"      => "%{[metadata][agent][type]}-%{[metadata][agent][version]}"
			"path"             => "%{[metadata][log][file][path]}"
			"[log][offset]"    => "%{[metadata][log][offset]}"
			"[host][name]"     => "%{[metadata][host][name]}"
			"lls"   		       => "%{[metadata][lls]}"
		}
	}

	if [metadata][role] {
		mutate { add_field => { "[role]" => "%{[metadata][role]}" } }
	}
	if [metadata][phenv] {
		mutate { add_field => { "[phenv]" => "%{[metadata][phenv]}" } }
	}
	
	if [event][module] != "service" {
		ruby {
			code => "begin
						event.set('log_environment', event.get('[metadata][asmls][environment]')[0].downcase)
					rescue
						event.tag('_ruby-l4a-distributor-1')
					end"
		}
	}
    
}
output {

	if [metadata][agent][type] == "filebeat" or [from_otherlls] {
		if      [event][module] == "alert"			{ pipeline { send_to => "l-alert" } }
		else if [event][module] == "service"	{ pipeline { send_to => "l-service" } }
		else if [event][module] == "beats"        	{ pipeline { send_to => "l-beats" } }
		else if [event][module] == "es"			{ pipeline { send_to => "l-es" } }
		else if [event][module] == "logstash"   	{ pipeline { send_to => "l-logstash" } }
		else if [event][module] == "monitorasmls"		{ pipeline { send_to => "l-monitor" } }
		else if [event][module] == "nginx" 	        { pipeline { send_to => "l-nginx" } }
		else if [event][module] == "performance"    { pipeline { send_to => "l-performance" } }
		else if [event][module] == "process"		{ pipeline { send_to => "l-process" } }
		else if [event][module] == "prodirect"	{ pipeline { send_to => "l-prodirct" } }
		else if [event][module] == "profileha"   	{ pipeline { send_to => "l-profileha" } }
		else if [event][module] == "prohost"	{ pipeline { send_to => "l-phost" } }
		else if [event][module] == "robt"      	{ pipeline { send_to => "l-robt" } }
		else if [event][module] == "system"			{ pipeline { send_to => "l-system" } }
		else if [event][module] == "ttts"			{ pipeline { send_to => "l-ttts" } }
		else if [event][module] == "xf" 			{ pipeline { send_to => "l-xf" } }
		else if [event][module] == "zoom"			{ pipeline { send_to => "l-zoom" } }
		else										{ pipeline { send_to => "unknown" } } }
	  else										{ pipeline { send_to => "unknown" } }

}

Hi @Rios @Badger @leandrojmp,

Another strange thing is that I have two Logstash with the same config setup to handle the data load in each environment (dev, test, acc).

In dev, there is no such issue and it’s working as expected.

When it comes to the TEST environment, Out of two, one of the Logstash is processing all the events and working as expected. I don’t see any FATAL or pipeline address was an unavailable warning message.

But in acceptance, both logstash is stating the same FATAL errors and not at all coming up. It keep getting restarting.

Could you please someone help me to fix this issue?

Thanks
Siva

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.