We are building a search system on top of document style data(ie. a document that has a few states over its lifecycle and gets updated over time). Most indexing strategies I see are some form of time-based indexing(rollover API, monthly, daily etc). One of the main motivations behind such a strategy seems to be easily able to purge old data by deleting indexes altogether, among a few other benefits. Rather than following this time based purging pattern, what are the downsides of having fewer indices and continuously deleting documents one at a time as we listen to delete events from an external system? Wont this storage space eventually be reclaimed in the background so we can continue writing to the index?
TL;DR Is there a strong reason to drop whole indices rather than keep deleting specific data in order to reclaim storage space? Thanks
You can do your alternative but it's, relatively, extremely costly to do this over deleting complete indices.
This could mean slow indexing or searching as merges happen to age older documents out, for eg, requiring faster SSDs (which has a cost).
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.