Logstash jdbc query issue


(Jamie) #1

I have the latest ELK stack installed and all three are communicating correctly. I'm trying to get logstash to query a 'mysql' database.

my logstash.conf file:

input {
jdbc {
jdbc_driver_library => "/opt/logstash/jdbc/mysql-connector-java-5.1.36-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://192.168.1.5:3306/pacsdb"
jdbc_user => "pacs"
jdbc_password => "******"
statement_filepath => "/opt/logstash/query/arch4.sql"
type => "cd_exams_ripped"
}
}

output {
elasticsearch {
index => "cdrip-index"
}
}

in the path '/opt/logstash/query' I have my 'arch4.sql' file

select distinct se.src_aet as "Ripped By", s.created_time as "Date/Time Sent", p.pat_name as "Patient Name", p.pat_id as "Patient ID", s.accession_no as "ACC #", p.pat_birthdate as "DOB", s.mods_in_study as "MOD", s.study_datetime as "Study Date", s.study_desc as "Study Desc", s.study_custom1 as "Inst Name"
from patient p
INNER JOIN study s
on p.pk = s.patient_fk
INNER JOIN series se
on s.pk = se.study_fk
where s.accession_no like '%OUT%'
and s.created_time >= curdate()

and my logstash.log file

{:timestamp=>"2016-04-28T12:32:38.884000-0500", :message=>"Pipeline main started"}
{:timestamp=>"2016-04-28T12:32:40.100000-0500", :message=>"Pipeline main has been shutdown"}
{:timestamp=>"2016-04-28T12:32:41.908000-0500", :message=>"stopping pipeline", :id=>"main"}
{:timestamp=>"2016-04-28T13:20:06.723000-0500", :message=>"Pipeline main started"}
{:timestamp=>"2016-04-28T13:20:07.758000-0500", :message=>"Pipeline main has been shutdown"}
{:timestamp=>"2016-04-28T13:20:09.743000-0500", :message=>"stopping pipeline", :id=>"main"}

can anyone see anything wrong that I'm doing? odviously, the pipeline for logstash keeps shutting down.


(Magnus Bäck) #2

Unless you've configured a schedule I think the jdbc input is supposed to shut down Logstash after it has made the query once.


(Jamie) #3

Oh really, a schedule. Okay, I'll look into it and try to figure that part out.


(Jamie) #4

Okay, so I enabled the schedule to just query each minute, and now everything is getting duplicated in kibana. I wouldn't expect this behavior.


(Magnus Bäck) #5

The documentation describes how to only grab rows that have been added/updated since the last run.


(Jamie) #6

Thanks. I found where I can use ':sql_last_start' except I'm using a 'statement_filepath' instead of 'statement'.


(Jamie) #7

are you referring to the 'jdbc docs'? https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html


(Magnus Bäck) #8

are you referring to the 'jdbc docs'?

Yes, the documentation of the jdbc input. You're on the right track with :sql_last_start.


(Jamie) #9

So I have added more parameters to my .conf file.

input {
  jdbc {
    jdbc_driver_library => "/opt/logstash/jdbc/mysql-connector-java-5.1.36-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://192.168.1.5:3306/pacsdb"
    jdbc_user => "pacs"
    jdbc_password => "*****"
	schedule => "* * * * *"
    statement_filepath => "/opt/logstash/query/arch4.sql"
	clean_run => "false"
	record_last_run => "true"
	last_run_metadata_path => "/opt/logstash/lastrun/.logstash_jdbc_last_run"
    type => "cd_exams_ripped"
  }
}

output {
  elasticsearch {
	index => "cdrip-index"
	}
}

I could not use the:sql_last_start in my .conf file since I was calling 'statement_filepath'. All of the examples I have seen so far that use ':sql_last_start' have been using the 'statement' parameter.

So, my file /opt/logstash/lastrun/.logstash_jdb_last_run is getting updated, but the time that is getting written to it is about 5 hours in the future. My timezone is CST and not sure if logstash is set for the wrong timezone perhaps?

All in all, the extra additions to my .conf file are still leading to duplicates in kibana.

What am I doing wrong here to make it duplicate everything? I know there has to be an answer! lol

thanks for all of your help so far.


(Magnus Bäck) #10

I could not use the :sql_last_start in my .conf file since I was calling 'statement_filepath'. All of the examples I have seen so far that use ':sql_last_start' have been using the 'statement' parameter.

You can use statement_filepath with :sql_last_start.

So, my file /opt/logstash/lastrun/.logstash_jdb_last_run is getting updated, but the time that is getting written to it is about 5 hours in the future. My timezone is CST and not sure if logstash is set for the wrong timezone perhaps?

I suppose Logstash uses UTC so you'd have to adapt your query accordingly.

Note that CST is ambiguous; it can either mean Central Standard Time or China Standard Time. Prefer using UTC offsets.


(Jamie) #11

Thanks for your reply. My timezone is Central Standard Time, sorry for the confusion.

I modified my > statement_filepath again to statement_filepath => "/opt/logstash/query/arch4.sql > :sql_last_start

after I restarted logstash I found this in the log file.

{:timestamp=>"2016-05-02T14:25:03.840000-0500", :message=>"Invalid setting for jdbc input plugin:\n\n input {\n jdbc {\n # This setting must be a path\n # File does not exist or cannot be opened /opt/logstash/query/arch4.sql > :sql_last_start\n statement_filepath => \"/opt/logstash/query/arch4.sql > :sql_last_start\"\n ...\n }\n }", :level=>:error}

I tried that before and that is why I assumed that I couldn't use the:sql_last_start with statement_filepath

Can you please tell me how I should type out my 'statement' line?

Thanks


(Magnus Bäck) #12

Something like this:

statement => "SELECT xxx FROM yyy WHERE zzz AND timestamp > :sql_last_start"

But again, the timestamp put into the sql_last_start named parameter is UTC-based so you may have to make a timezone adjustment of your timestamp column.


(Jamie) #13

Thanks for the reply. You had stated earlier that I could use statement_filepath, but I see your just using filepath. I'll put my entire query into the .conf file instead of calling a .sql file and give it a shot. thanks


(Purva Kulkarni) #14

@magnusbaeck could you please give an example as to how to use sql_last_start with a statement_filepath.
I have spent way too much time figuring this out but still haven't
Thank you.


(Jamie) #15

I got it to work, but after every query, your results get dupped. Its not worth it.


(Purva Kulkarni) #16

what would be the solution then? I need to use a .sql file since my query is really big and hence have to use a filepath.
I read your solution with last_run_metadata in input but I cant find the path to lagstash log files either. I am on OS X EI CAPITAN.


(Jamie) #17

Sorry to say that for me, there wasn’t a solution. That is as far as I got. I was able to make it query, but in logstash, all of my results were doubled, tripled, then quad..well you get it. Do you want to see my config that I used to get it to query?

input {

jdbc {

jdbc_driver_library => "/opt/logstash/jdbc/mysql-connector-java-5.1.36-bin.jar"

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_connection_string => "jdbc:mysql://10.196.50.51:3306/pacsdb"

jdbc_user => "my username"

jdbc_password => "my super secret password"

            schedule => "* * * * *"

statement_filepath => "/opt/logstash/query/myquery.sql"

            clean_run => "false"

            record_last_run => "true"

            last_run_metadata_path => "/opt/logstash/lastrun/.logstash_jdbc_last_run"

type => "cd_exams_ripped"

}

}

output {

elasticsearch {

            index => "cdrip-index"

            }

}

I think the answer was supposed to be the line ‘last_run_metadata_path => "/opt/logstash/lastrun/.logstash_jdbc_last_run"’ … but mine seemed to ignore it. I always thought that logstash just didn’t use the correct timezone perhaps.

It would be fantastic to use, but I had to move on to other projects.

Jamie


(Maurice Meaney) #18

You can edit the SQL file referenced in the statement_filepath, just add the line
AND timestamp >= :sql_last_value to the SQL file


(system) #19