Logstash update elastic search when any update in a table

Hello, I have a Users table in my DB and I have successfully synced my logstash to update new records. But I want a way to update the existing records on Elasticsearch.
I don't have any timestamp column in the table to maintain updated_on.

Here's the config till now with inserting new data in elastic :

input {
  jdbc {
     jdbc_connection_string => "jdbc:postgresql://localhost:5432/litmusblox"
     jdbc_user => "postgres"
     jdbc_password => "password"
     jdbc_driver_class => "org.postgresql.Driver"
     statement => "SELECT * from users where id > :sql_last_value"
     use_column_value => true
     tracking_column => "id"
     last_run_metadata_path => "/usr/share/logstash/jdbc-lib-manual/.logstash_jdbc_last_run"
     jdbc_paging_enabled => true
     tracking_column_type => "numeric"
     schedule => "*/30 * * * * *"
 }
}
output {
  elasticsearch {
    cloud_id => "Test-deployment:xyz"
    cloud_auth => "elastic:pass"
    index => "users"
    document_id => "users_%{id}"
    doc_as_upsert => true
    #user => "elastic"
    #password =>"password"
 }
}

I can see two way to do this.
you refresh whole table every-time you sync because you don't have anyway to track which record in postgresql is changed. "select * from users"

or you add a date field to your table and then you can simple do
select * from users where updated_date > sysdate-1

refreshing the whole table won't be a good idea as the table size is huge.
adding timestamp column could be a solution for me. Can we do REST calls through the code whenever there is changes in table? is this a right approach?

I don't see how you know which record changes in db.

if your table is huge that means you still have to pull all the record from db. and then all record from elasticserach and compare both set. and then update only one which are changed.

Best bet is to add date column in database.

So, final requirement is that my Elasticsearch index should get updated with change in table records, can you suggest some other tool by which this is possible if not with logstash? any tool which can syncup with my db and Elasticsearch and replicate the data on Elasticsearch?

BTW thank you for the quick response, you are really helping me in the POC.

what is your POC?

I do have good experience with pulling data from database and keeping them in sync at elk

My POC is to create a connection with the DB(postgres) and update the tables which I have indexed on Elastic whenever there is update, create, deletion of records in DB

isn't that exactly what you just did?
first step is to fully sync table

second part is every few min you run query with updated > sysdate- interval 'x' minute and update only that record on Elasticsearch

But as I mentioned earlier I want to find a way in which I don't need to add the updated_on column in the table.

I don't see that is possible.

if you can't have tracking on which record is updated in database how would you only pull that record. think about it? if you find a logic to implement well and good. if not then you need updated_on column

If your database supports REST calls from stored procedures called by a TRIGGER then you may be able to track updates that way, but that is an SQL question, not a logstash question.

I have read about some tools which we can use, like PGSync , which will do the syncup without keeping record of updated_on column, it uses WAL file which get's generated by postgres DB, by processing that file it can update ELK. I will try that once.

Sure, will check that.

I don't see how you know which record changes in db...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.