Best way to delete all the documents of an Engine

Hello, I need to delete all the documents (30k) on an Engine.
I saw another post suggesting to list/query all of them and then call DELETE. I tried that but it's veeery slow, even sending bulks of 200 documents in each DELETE request.
Besides I'm having the problem that after deletion, the search is returning the deleted documents again, entering in a loop.
Anyway, this is very inefficient.

I also tried to delete the engine and recreate it with the same name in version 7.6, but after a couple of minutes it begins responding error 500 when posting new documents. I would prefer this way since it requires a lot less requests to archieve, and is much more quicker

Creating a new engine each time, loading up with all the documents and then delete the older engine may be a solution, but it comes with complex logic

What path do you recommend??

Hello @gpribi :wave:

App Search wasn't designed for something like this because the search on your website/app would sort of be broken/partial while it is being emptied, so perhaps you could explain the use case a bit more, would be interesting to learn more about why you need to delete all the documents in an engine, and not just delete the engine itself.

Various engine settings, like curations, would also be affected by emptying the engine, so that's another reason for not supporting something like what you're describing.

2 Likes

Sure! I have a marketplace with 30k products. My main database is a mysql, mirroring async into appsearch with API calls for search purposes.

Those products are constantly being created, modified, deleted, rejected, paused, resumed, past due date, manually deleted because of duplications or account suspensions, and many other operational situations. Many of them happen in my API, and some of them are still manual updates directly in the database.

Every API call I receive triggers an Appsearch API call in order to apply that change.

Since I have so many different situations that alter the product set, and still having a lot of modifications directly in mysql, I want to make a fully initial load everyday with logstash, just to be sure that at the begining of the new day, my product set will be updated.
Otherwise, If I just miss a DELETE because of a bug, direct modification in mysql, or any other reason, that product will be forever in my resultset, and that can be dangerous.

I tried using logstash for ETL instead of appsearch API calls, but its output plugin lacks DELETE and PATCH requests, just PUSH, so it falls short for my needs.

A couple of minutes of downtime aren't a problem since I have a failover search method directly to mysql during the couple of minutes the initial load is happening.

It won't be a problem to delete an engine and recreate it. I don't have any curations nor synonims , and the boosts and wheights are specified in the JSON requests. The problem is that after deleting the engine, and recreating the engine and schema, after a couple of minutes every POST request begin returning error 500, without any clue in the API Logs. The only solution is to destroy the elasticsearch and appsearch containers (I'm using docker) and recreate them

I hope I was clearly enough

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.