Truncate/delete the entire index once new data arrives from Logstash

(PB) #1

Hello Guys,

We are preparing a pipeline, responsible for getting the 'current state' of data in our source table.

What it means is that the source table contains let's say 10 records. We would like to reflect those 10 records in Elasticsearch using Logstash to pull them every 10 minutes. During the day, the number of names might change, when for example someone will be deleted. So once the Logstash will run and will pull those 9 records, we would like to have it reflected in Elasticsearch with an index with 9 documents. We don't need old documents, as we want to see the 'current state'. We've been thinking about a mechanism that will truncate/delete index before new data will be pushed, but I'm not sure how we could achieve that using only Logstash and Elasticsearch and making sure that the data will always be present in an index.
Is that achievable in an automatic manner?

(PB) #2

Hi Guys,

Anything that comes to your mind? I know that for updating existing records, we can use upsert, but how about deleting documents that no longer exist in source table? Is that possible to do using logstash?

(David Pilato) #3

Read this and specifically the "Also be patient" part.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

We are not all guys fortunately. I think that Hi! is perfectly enough :slight_smile:

Not really. You can consider different options:

Use a technical temporary table of deleted items. Read that table and delete every document which is referenced in it.
Use a trigger
Modify the application layer (the service layer) and do that in real time. That's my preferred way.

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/