Logstash 6.2.2 with JDBC data on not updated

Ongun · April 5, 2018, 2:28pm

Hello all,

I have an ELK Stack 6.2.2 with X-pack trial license. I am running a multiple pipeline configuration with JDBC drivers and sometimes I notice that my data on Elasticsearch is not updated (particularly when multiple pipelines are run simultaneously). I notice this through @timestamp field in the discover portion of Kibana. Currently I have 7 pipelines that I want to run simultaneously and they are all scheduled to run queries against the database in intervals of 5 seconds.

What do you suggest for my situation? Are intervals of 5 seconds too tight and possibly overwhelm the SQL Server? I wonder if somehow the scheduler can be adjusted to wait for the previous pipelines to finish before executing the next scheduled queries.

Thanks in advance

My pipelines.yml file is as follows:

## Never give the full path on Windows
 - pipeline.id: Training
   path.config: "./pipelines/sql_tables_Training.conf"
   
 - pipeline.id: YKBÜ
   path.config: "./pipelines/sql_tables_YKBÜ.conf"
   
 - pipeline.id: YKBÜ Session
   path.config: "./pipelines/sql_tables_YKBÜ_session.conf"
   
 - pipeline.id: Training Processes
   path.config: "./pipelines/sql_Training_processes.conf"
   
 - pipeline.id: YKBÜ Processes
   path.config: "./pipelines/sql_YKBÜ_processes.conf"
   
 - pipeline.id: Training Queues
   path.config: "/Users/ongun.arisev/Downloads/ElasticStack/logstash-6.2.2/pipelines/sql_Training_queues.conf"
   
 - pipeline.id: YKBÜ Queues
   path.config: "./pipelines/sql_YKBÜ_queues.conf"

Each configuration pipeline file is similar and I post here one of them as an example:

input {
	jdbc {
		jdbc_driver_library => "C:\Program Files\sqljdbc_6.0\enu\jre8\sqljdbc42.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://;integratedSecurity=true"
		schedule => "*/5 * * * * *"
		jdbc_user => "ongun.arisev"
		#statement_filepath => "C:\Users\ongun.arisev\Downloads\ElasticStack\logstash-6.2.2\sqlprocedure.txt"
		statement => "	EXEC [YKBÜ].[dbo].[BPDS_QueueVolumesNow];"
		tags => "YKBÜ_queues"
	}
}

filter {
	mutate {
			add_field => {
				"Veritabani" => "YKBÜ"
			}
	}

}

output {
	# stdout { codec => "rubydebug" }
	elasticsearch{
		hosts => [ "localhost:9200" ]
		index => "sql_queues_ykb"
		document_id => "%{name}"
		user => "logstash_internal"
		password => "x-pack-test-password"
	}
}

PatmanAA · April 5, 2018, 6:48pm

@Ongun I'm really not an expert but maybe is as something to do with
pipeline.workers:

https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
Best regard's
pat

guyboertje · April 5, 2018, 8:21pm

Its hard to understand what you are trying to achieve. The sql statements give no clues. What does the data look like? How does it get updated? How many rows per second are inserted?

Unless you are planning to always ingest all records each time, you need to add some form of progress tracking i.e. WHERE some_id > :sql_last_value.

To be honest, I have never used the jdbc input in a multiple pipeline scenario.

Ongun · April 5, 2018, 8:43pm

The SQL statement (actually a stored procedure) returns a table consisting of like at most 30 rows with 5 columns. Column number of the SQL query result is always the same and row number also mostly does not change. The only dynamic part then are the values in the table, their change rate depends. But I can say that it varies between 2 seconds to 2 minutes.

So this is not a very loaded table. However, when used together with multiple pipelines sometimes data is not updated.

guyboertje · April 5, 2018, 10:49pm

I suppose you are trying to "mirror" the state (as it changes) into ES buy overwriting the ES document with an index per table. Correct?

If so have you considered CDC.

I guess I am saying that it seems to me that you need a more reliable async way of capturing these changes.

Ongun · April 6, 2018, 7:54am

Exactly I overwrite the table each time Logstash runs, I have not considered CDC. Thanks for the suggestion. I was also considering not overwriting but recording the tables together with their timestamps to have time series data.

system · May 4, 2018, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pipeline Performance & jdbc_static filter Elasticsearch	7	835	December 17, 2018
Logstash pipeline stalling without any error Logstash docker	9	1470	November 24, 2021
Logstash input plugin jdbc schedule issue on k8s Logstash	2	768	March 22, 2021
Logstash jdbc_static filter plugin performing very poorly (nearly not at all) Logstash	27	2498	February 18, 2020
Schedule not working sometimes \| gaps in records Logstash	2	369	March 15, 2022

Logstash 6.2.2 with JDBC data on not updated

Related topics