Questions about handling delete operations on a latest-data-only index

Hi everyone,

I have a question regarding our use case following this topic -

In our case we use a single index per customer (without rolling it) and keeping its data up to date by indexing or deleting documents when they are changed or deleted.

Our queries are mainly bool-must ones on several document properties, they're very structured and not including much of free text searches.

When we index the documents for the first time we generate the index document ids and save them for later updates / deletes by document id instead of by query.

When an update or delete operations needs to take place we use the _bulk api with index and / or delete operations with the document ids that needs to be indexed or deleted (an update is an index operation, we simply override the document).

This process happens several times a day (~10), and we're talking ~10K documents total at most per index of a customer.
It's important to mention that most documents doesn't change at all so we skip them and right now everything is working as expected.

I know that delete operations are not recommended because the delete operation doesn't actually deletes the document but marks it for deletion until the index is merged, but I couldn't find any further information on how this affects the performance for the long run.

Looking at our use case and type of queries (99% structured) and assuming that we're talking small amounts of delete operations, I'd say 10% - 20% of the index total documents amount might be deleted once a day,
is that still something to worry about? or should I say - would you consider changing the whole approach to an index-only (no deletes) approach and rolling the index because of that?

Does it change anything that we use _bulk delete api call rather than delete_by_query in terms of the mark for deletion of the document? or both of them yield the same result?

Is there a way to force elasticsearch to actually delete the document without waiting for the merge?

Right now we don't see any deterioration in performance but I wonder what should we expect if we'll keep working the way we do? slower search performance?

Is there any documentation regarding the deletion process?
I couldn't find anything regarding this in the documentation here -

And if you can share from your experience with delete operations I'd really appreciate it.
how bad is it really? :slight_smile:

Thanks,
Niv

Welcome to our community! :smiley:

TLDR for what you are describing, I don't think this is a major issue to worry about. Ideally, make sure you have SSD storage to allow fast merging.

If you were talking about very large deletes, this approach wouldn't make a tonne of sense.

1 Like

Thank you for your reply warkolm :slight_smile:

Hi warkolm,

Any chance you can elaborate a bit about the implications of very large deletes? slower overall performance?

Thanks :slight_smile:

If' you've got a high number/volume of deletes it can cause higher than average merges, which has an IO impact that'll likely also show up in indexing and querying.

1 Like

Thank you warkolm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.