Logstash-plugin: Data load takes thrice the time after upgrading from logstash 6.2.4 to 6.5.2 due to mutate filter

I'm fetching data using the jdbc input filter for logstash and after some mutate converts and ruby filter calculations, the data is output to elasticsearch.
My logstash config file for 6.2.4 is

input {

	jdbc {
		jdbc_driver_library => "/home/cloud_as1/logstash/driver/ojdbc7.jar"
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		jdbc_connection_string => ""
		jdbc_user => ""
		jdbc_password => ""
		jdbc_fetch_size => 100000
		last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
		statement => "select * from testdb"
	}
}

filter {

	mutate {

		convert => {
			"field1" => "float"
			"field2" => "integer"
			"field3" => "float"
			"field4" => "integer"
			"field5" => "float"
			"field6" => "float"
			"field7" => "float"
			"field8" => "float"
			"field9" => "float"
			"field10" => "float"
			"field11" => "float"
			"field12" => "float"
			"field13" => "float"
			"field14" => "integer"
			"field15" => "float"
			"field16" => "float"
			"field17" => "float"
			"field18" => "float"
			"field19" => "float"
			"field20" => "float"
			"field21" => "float"
			"field22" => "float"
			"field23" => "float"
			"field24" => "float"
			"field25" => "float"
			"field26" => "float"
			"field27" => "float"
			"field28" => "float"
			"field29" => "float"
			"field30" => "float"
		}

		remove_field => ["message", "host"]
	}

	fingerprint {
		method => "SHA1"
		key => "asdfg"
		source => ["field1", "field2", "field3", "field4", "field5", "field6"]
		concatenate_sources => true
		target => ["[uid]"]
	}

	date {
		match => ["field31", "yyyy, MMM"]
		target => "field32"
	}

	if "_dateparsefailure" in[tags]{
		drop {}
	}

	ruby {
		path => "/home/logstash/dataload/conf/calculations/customFields.rb"
	}

	if "_rubyexception" in[tags]{
		drop {}
	}
}
output {
	elasticsearch {
		hosts => "hostname"
		index => "index"
		document_id => "%{[uid]}"
	}
}

This process takes about 20 minutes when executed on logstash 6.2.4.
Using the same config file if executed on logstash 6.5.2, surprisingly it takes about 2 hours.

NOTE : The environment is the same for both executions.
4-core 32GB, CentOS
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

Logstash : 
JVM options
-Xms4g
-Xmx4g

pipeline.batch.size: 2250

Elasticsearch : 
JVM options
-Xms8g
-Xmx8g

There is a template present for the index on elasticsearch. For float values coerce is set to false and null_value is 0

Now, I thought this might be due to the fact that logstash might be consuming time to convert my fields to float.
Hence I removed all the explicit float conversions from the config file.

input {

	jdbc {
		jdbc_driver_library => "/home/cloud_as1/logstash/driver/ojdbc7.jar"
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
		jdbc_connection_string => ""
		jdbc_user => ""
		jdbc_password => ""
		jdbc_fetch_size => 100000
		last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
		statement => "select * from testdb"
	}
}

filter {

	mutate {
		convert => {
			"field2" => "integer"
			"field4" => "integer"
			"field14" => "integer"
		}

		remove_field => ["message", "host"]
	}

	fingerprint {
		method => "SHA1"
		key => "asdfg"
		source => ["field1", "field2", "field3", "field4", "field5", "field6"]
		concatenate_sources => true
		target => ["[uid]"]
	}

	date {
		match => ["field31", "yyyy, MMM"]
		target => "field32"
	}

	if "_dateparsefailure" in[tags]{
		drop {}
	}

	ruby {
		path => "/home/logstash/dataload/conf/calculations/customFields.rb"
	}

	if "_rubyexception" in[tags]{
		drop {}
	}
}
output {
	elasticsearch {
		hosts => "hostname"
		index => "index"
		document_id => "%{[uid]}"
	}
}

After this, the process takes about 6 mins on logstah 6.5.2.

I would like to know the cause for different data load times in 2 different releases of logstash. Reading through the change-logs for all logstash versions after 6.2.4, I have not found any changes/fixes/upgrades with regards to mutate filter for floats.
I'm interested to know if there is any change/fix in the mutate filter for float between 6.24 and 6.5.2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.