How to drop/create index in Logstash config?

We have a scenario where documents are still in the index that are no longer in the source data being fed into the index.

Is there a way to delete the index and just re-create it with the latest data?

Or is there a way to do it conditionally by removing old/stale documents by id through setting the action dynamically?

....
mutate {
        
        add_field => {
            "id" => "%{our_internal_id}"
        }
       ...
    }

    elasticsearch {
        hosts => ["https://....]           
        index => "my_index"
        query => "_id:%{[@our_internal_id]}"
        fields => ["_id"]
    }

    if ![_id] {
        mutate {
            add_field => {"action" => "delete"}
        }
    } else {
          mutate {
            add_field => {"action" => "update"}
         }
    }

}
# end filter
 
output {  
 
    if [action] == "update" {
        elasticsearch {
            hosts => ["https://..."]    
            action => "update"
	    ...
            doc_as_upsert => true
            document_id => "%{id}"        
            index => "my_index"
        }
    } else {
        elasticsearch {
            hosts => ["https://..."]   
	    ...          
            action => "delete"
            document_id => "%{id}"        
            index => "my_index"
        }
    }
}

If the source DB has a document id that you can use throughout the system (e.g. [our_internal_id]) then that makes things easier.

One approach would be to re-fetch the entire source DB with logstash and write it to a new index, then point an active alias to the new index and delete the old one (outside of logstash).

If the source DB supports triggers then you may be able to send a record to logstash to tell it that a DB row has been deleted so that LS can delete it from ES.

You might be able to do it by repeatedly fetching all the ids from the destination DB and testing whether they exist in the source. If not, delete them from the destination.

The use case probably has constraints on how quickly new documents must be indexed, how quickly deleted documents must be removed, and how rapidly the source data changes.

If the source supports triggers for updates then this could be quite efficient, if the source does not change very often but you have to pull the entire DB over and over again to detect changes then it is going to be expensive.

If the source DB does not have a document id field then you can add one that defaults to NULL and populate it in the source for records when they are added to the destination.

There are definitely hi-level flows other than those mentioned above that could implement this.

Thanks for the reply. The data we're consuming is just a SQL view that we check every few hours and only has about 10k rows in it so it's not that expensive to load imo. I don't have much control over the backend so the trigger option won't work.

In the meantime, I've also found these posts. One that can use an index template to accomplish this apparently but no details

Delete and recreate index in elasticsearch - Elastic Stack / Logstash - Discuss the Elastic Stack

And the other Delete all documents from specific ES index, that are not in the logstash feed - Elastic Stack / Logstash - Discuss the Elastic Stack which you actually suggested deleting all the documents that don't have the most recent "last updated" timestamp which makes sense. I just don't know how to approach that.

It seems these are more straight forward but just unsure if done right inside the pipeline config as well