Update previosuly indexed data

Hello !

Is there any way to detect with logstash jdbc if some previously indexed field value has been modified and if so, replace the old value with the new modified one in the next scheduled execution?

Thanks in advance!

Hello Rodrigo,

The best way would be to have a modification date in your table and select all entries in the jdbc input with the modification date greater than the last execution date. depending on your database you can fill the modification date in a trigger before update so you would not need to change your application.

The elasticsearch output has a setting called action which is by default configured to index which means:

Indexes the specified document. If the document exists, replaces the document and increments the version.

Best regards
Wolfram

First of all, thank you for answering my question.

I have the field modification date.

so, if I put
modification_date > sql_last_value AND date > sql_last_value in the jdbc query and in the output
action => index

Will this replace the old values ​​with the new ones or will it create a new entry? In case he substitutes them, could you tell me how elk does that match to avoid the duplicate entry?

Thank you in advance!

Sorry, I forgot that part :frowning: Elasticsearch matches documents by ID. documents with the same ID will overwrite the existing entries. If you do not configure a custom ID it will be autogenerated so it will not overwrite.

If your table contains an ID field you can use the elasticsearch output configuration for document_id directly: Elasticsearch output plugin | Logstash Reference [8.11] | Elastic

If you do not have an ID field it is harder but you could use the fingerprint processor on fields that never change and use this fingerprint as ID in the output plugin.

Okey. A long time ago I tried to create a unique id with the fingerprint plugin for the document_id because it had not an id field and it works perfectly. All documents were indexed with an unique id field, but I don't understand how elk matches the document_id the date and the value and if the value was modified change it.

In my project we want to monitor the backups made in the last days but sometimes after a few days we verify if it has been done correctly if not, the value of bakups_ok is changed manually in the database for the new number of backups and I do not know how to make logstash update it automatically.

thank you and sorry for the inconvenience!

Can you post an example document with sample data?

i dont know exactly what you want.

we have 4 fields : backup_ok backup_req date and modification_date and we are storing the sql_last_value with the last date when the pipeline was executed and sending the data to elk with a single output with host and index name. but i dont know how to automatically update the value if a pre-indexed value has been changed.

So, the date is the primary key here, right?
So you could use the fingerprint processor like this:

fingerprint {
  source => ["date"]
  target => "id"
}

This way, it will create an id field which you can use in the elasticsearch output from the date field. When you update your data in the database the date field will not be updated - only the modification_date so the fingerprint will generate the same id.
Therefore, elasticsearch will detect that a document with this id already exists and will replace it instead of creating a new document.

1 Like

Thank you !! i will try it

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.