Logstash 6.2.2 with JDBC data on not updated

Hello all,

I have an ELK Stack 6.2.2 with X-pack trial license. I am running a multiple pipeline configuration with JDBC drivers and sometimes I notice that my data on Elasticsearch is not updated (particularly when multiple pipelines are run simultaneously). I notice this through @timestamp field in the discover portion of Kibana. Currently I have 7 pipelines that I want to run simultaneously and they are all scheduled to run queries against the database in intervals of 5 seconds.

What do you suggest for my situation? Are intervals of 5 seconds too tight and possibly overwhelm the SQL Server? I wonder if somehow the scheduler can be adjusted to wait for the previous pipelines to finish before executing the next scheduled queries.

Thanks in advance

My pipelines.yml file is as follows:

## Never give the full path on Windows
 - pipeline.id: Training
   path.config: "./pipelines/sql_tables_Training.conf"
 - pipeline.id: YKBÜ
   path.config: "./pipelines/sql_tables_YKBÜ.conf"
 - pipeline.id: YKBÜ Session
   path.config: "./pipelines/sql_tables_YKBÜ_session.conf"
 - pipeline.id: Training Processes
   path.config: "./pipelines/sql_Training_processes.conf"
 - pipeline.id: YKBÜ Processes
   path.config: "./pipelines/sql_YKBÜ_processes.conf"
 - pipeline.id: Training Queues
   path.config: "/Users/ongun.arisev/Downloads/ElasticStack/logstash-6.2.2/pipelines/sql_Training_queues.conf"
 - pipeline.id: YKBÜ Queues
   path.config: "./pipelines/sql_YKBÜ_queues.conf"

Each configuration pipeline file is similar and I post here one of them as an example:

input {
	jdbc {
		jdbc_driver_library => "C:\Program Files\sqljdbc_6.0\enu\jre8\sqljdbc42.jar"
		jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
		jdbc_connection_string => "jdbc:sqlserver://;integratedSecurity=true"
		schedule => "*/5 * * * * *"
		jdbc_user => "ongun.arisev"
		#statement_filepath => "C:\Users\ongun.arisev\Downloads\ElasticStack\logstash-6.2.2\sqlprocedure.txt"
		statement => "	EXEC [YKBÜ].[dbo].[BPDS_QueueVolumesNow];"
		tags => "YKBÜ_queues"

filter {
	mutate {
			add_field => {
				"Veritabani" => "YKBÜ"


output {
	# stdout { codec => "rubydebug" }
		hosts => [ "localhost:9200" ]
		index => "sql_queues_ykb"
		document_id => "%{name}"
		user => "logstash_internal"
		password => "x-pack-test-password"

@Ongun I'm really not an expert but maybe is as something to do with

Best regard's

1 Like

Its hard to understand what you are trying to achieve. The sql statements give no clues. What does the data look like? How does it get updated? How many rows per second are inserted?

Unless you are planning to always ingest all records each time, you need to add some form of progress tracking i.e. WHERE some_id > :sql_last_value.

To be honest, I have never used the jdbc input in a multiple pipeline scenario.

The SQL statement (actually a stored procedure) returns a table consisting of like at most 30 rows with 5 columns. Column number of the SQL query result is always the same and row number also mostly does not change. The only dynamic part then are the values in the table, their change rate depends. But I can say that it varies between 2 seconds to 2 minutes.

So this is not a very loaded table. However, when used together with multiple pipelines sometimes data is not updated.

I suppose you are trying to "mirror" the state (as it changes) into ES buy overwriting the ES document with an index per table. Correct?

If so have you considered CDC.

I guess I am saying that it seems to me that you need a more reliable async way of capturing these changes.

Exactly I overwrite the table each time Logstash runs, I have not considered CDC. Thanks for the suggestion. I was also considering not overwriting but recording the tables together with their timestamps to have time series data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.