Logstash get same data from database


(Ali) #1

Hello ,
im using logstash to get data from my database , and all forks fine , but when i change the number of line in my table for exemple i delete some rows , i still find the same rows as document in my elasticsearch.

my example : i have 10 rows in my table , i run logstash and i find my 10 rows as document in ES , i go back to my database i delete 5 rows i run logstash i go back to ES i fin 10 documents.

please do you have any idea or config to inform logstash to get the same line as the table and delete the existing documents if doesn't exists in the select query.

Regards !


(Lewis Barclay) #2

You will need to show us some configuration in order to help


(Ali) #3

Hi this my pipline config :

input {
    jdbc {
		jdbc_connection_string => "dburl..."
        jdbc_user => "username"
		jdbc_password => "password"
		jdbc_driver_library => "/opt/elasticsearch/logstash/drivers/ojdbc7.jar"
		jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
            schedule => "*/1 * * * *"
        statement => "select * from my_table"
    }
}

filter {
  mutate {
  
	rename =>   {
          "numfolder"  => "numFolder"
          "creationdate"  => "creationDate"
          "stateuser"  =>  "stateUser"
          "isvalid"  =>  "isValid"
          "isbroken"  =>  "isBroken"  		  
        }	 
    convert => {
	
      "isValid" => "boolean"
      "isBroken" => "boolean"
    }
	
  }
}

output { 

elasticsearch {
	index => "users"
	document_type => "infos"
	document_id => "%{no_info}"
	hosts => [ "localhost:9200" ] 
}

}

Thanks


(Lewis Barclay) #4

Doesn't look like you are running your JDBC on schedule, you are only doing one request? Also you haven't deleted old data from ES after dropping statements, as far as I can tell?


(Ali) #5

sorry i forget to add the line schedule => "*/1 * * * *" yes im using schedule property to get data from database every minute, my issue is how to inform ES to drop old data dynamically because dropping statements in database are trigred by users so i can't delete data from ES manually


(Lewis Barclay) #6

If you only want the current data and no historic, you could use the "action => update" setting in the output. You will need to define a unique ID to update against, but since this is a database it shouldn't be an issue!


(Ali) #7

you mean when using input as database "action => update" is set by default ?


(Lewis Barclay) #8

Not according to my docs:

I have an idea how to solve it, testing it now.


(Lewis Barclay) #9

I figured out a great solution for this, simply create a second input filter using elasticsearch as the input to pull all records from your index older than 2 minutes old. Since you are updating all records every minute this should be suitable. Add a tag for any returned results.

On your output, create a secondary output for that tag and again using elasticsearch as the output, using the delete action this time.

I've tested this and confirmed it 100% works. Took some effort to think of a solution but it works well.


(Ali) #10

thank you for your time can you please share with me your configuration file.

regards !