Truncate/delete the entire index once new data arrives from Logstash

Hello Guys,

We are preparing a pipeline, responsible for getting the 'current state' of data in our source table.

What it means is that the source table contains let's say 10 records. We would like to reflect those 10 records in Elasticsearch using Logstash to pull them every 10 minutes. During the day, the number of names might change, when for example someone will be deleted. So once the Logstash will run and will pull those 9 records, we would like to have it reflected in Elasticsearch with an index with 9 documents. We don't need old documents, as we want to see the 'current state'. We've been thinking about a mechanism that will truncate/delete index before new data will be pushed, but I'm not sure how we could achieve that using only Logstash and Elasticsearch and making sure that the data will always be present in an index.
Is that achievable in an automatic manner?

Hi Guys,

Anything that comes to your mind? I know that for updating existing records, we can use upsert, but how about deleting documents that no longer exist in source table? Is that possible to do using logstash?

Not really. You can consider different options:

Use a technical temporary table of deleted items. Read that table and delete every document which is referenced in it.
Use a trigger
Modify the application layer (the service layer) and do that in real time. That's my preferred way.

I shared most of my thoughts there: Advanced Search for Your Legacy Application - -Xmx128gb -Xms128gb

Hello @dadoonet

Noted. Appreciate it. The rush comes from the fact that we are currently doing PoC with ELK as a possible data platform, so the more information we get, the faster we can get to the step of evaluation, so apologies for that.

As for the proposal, unfortunately the source database cannot be changed in any way. We need to rely on it with current form.
I'm not yet familiar with triggers, so I'll sing my teeth into the documentation.

