Hello, I have a Users table in my DB and I have successfully synced my logstash to update new records. But I want a way to update the existing records on Elasticsearch.
I don't have any timestamp column in the table to maintain updated_on.
Here's the config till now with inserting new data in elastic :
I can see two way to do this.
you refresh whole table every-time you sync because you don't have anyway to track which record in postgresql is changed. "select * from users"
or you add a date field to your table and then you can simple do
select * from users where updated_date > sysdate-1
refreshing the whole table won't be a good idea as the table size is huge.
adding timestamp column could be a solution for me. Can we do REST calls through the code whenever there is changes in table? is this a right approach?
I don't see how you know which record changes in db.
if your table is huge that means you still have to pull all the record from db. and then all record from elasticserach and compare both set. and then update only one which are changed.
So, final requirement is that my Elasticsearch index should get updated with change in table records, can you suggest some other tool by which this is possible if not with logstash? any tool which can syncup with my db and Elasticsearch and replicate the data on Elasticsearch?
My POC is to create a connection with the DB(postgres) and update the tables which I have indexed on Elastic whenever there is update, create, deletion of records in DB
if you can't have tracking on which record is updated in database how would you only pull that record. think about it? if you find a logic to implement well and good. if not then you need updated_on column
If your database supports REST calls from stored procedures called by a TRIGGER then you may be able to track updates that way, but that is an SQL question, not a logstash question.
I have read about some tools which we can use, like PGSync , which will do the syncup without keeping record of updated_on column, it uses WAL file which get's generated by postgres DB, by processing that file it can update ELK. I will try that once.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.