Notify Logstash when new data is entered in mysql databse without using parameter schedule


(Ankit Khandelwal) #1

I am working on Elastic Stack with Mysql. everything is working fine like logstash taking data from mysql database and sending it to elasticsearch and when new entries entered in mysql data then to update elasticsearch automatically i am using parameter: Schedule but in this case logstash is checking continuously for new data from it's terminal that is my main concern.

input {

  jdbc { 
    jdbc_connection_string => "jdbc:mysql://localhost:3306/testdb"
    # The user we wish to execute our statement as
    jdbc_user => "root"
    jdbc_password => ""
    # The path to our downloaded jdbc driver
    jdbc_driver_library => "/home/Downloads/mysql-connector-java-5.1.38.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    #run logstash at an interval of on minute
    schedule => "*/15 * * * *"
    use_column_value => true
    tracking_column => 'EVENT_TIME_OCCURRENCE_FIELD'
    # our query
    statement => "SELECT * FROM xyz"
    }
    
  }
output {
  stdout { codec => json_lines }
  elasticsearch {
  "hosts" => "localhost:9200"
  "index" => "test-migrate"
  "document_type" => "data"
  "document_id"   => "%{personid}"
  }
}

But if data is large Logstash will check for new entries in entire data without any stopping point then this will reduce scalability and consume more power.

Is there any other method or any webhook like when new data is entered into database then mysql will notify Logstash only for new data or Logstash will check for only new entries, Please help


(David Pilato) #2

Then it's more a MySQL question IMO as this would happen on MySQL side before elastic stack is called.

Note that here you are running:

statement => "SELECT * FROM xyz"

There is no WHERE clause. If in your table you have something like a timestamp, you can then just ask for the newest changes since the last run.

But I prefer to modify the application layer. I shared some thoughts at http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/. May be that could help.


(Ankit Khandelwal) #3

thanks for your response, i am using where clause in my query as below ,

statement => "SELECT * FROM brainplay WHERE EVENT_TIME_OCCURRENCE_FIELD > :sql_last_value"

but this is doing same, did you get my point?


(David Pilato) #4

So you mean that ALL records are fetched again? Instead of only the ones which changed?


(Ankit Khandelwal) #5

actually logstash is fetching data continuously and every moment it is showing all inserted data on terminal that is very costly, i am looking for that it'll fetch only new entries and update it to elasticsearch


(David Pilato) #6

Then something is wrong in your logstash configuration. Check it.

I'm not sure that this is needed:

    use_column_value => true
    tracking_column => 'EVENT_TIME_OCCURRENCE_FIELD'

BTW printing everything in the console is also something you should only use for debugging purpose as it's slowing down everything as well.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.