Clarification on `docs.deleted` in regards to data deletion compliance requirements

My organization uses Elasticsearch to store and index information for one of our services. We have an internal compliance requirement to delete data whenever a user of our service requests their data to be deleted. In order to comply with this, we listen for data deletion events and call the delete API on all documents associated with a user when a data deletion request comes in. However, we recently learned that internally, data is marked for "delete", but is still persisted in storage when the delete operation is called. It is only until an automatic merge operation is run when this data gets fully removed from the cluster. We noticed that the docs.deleted metric has been steadily rising over the past year in our cluster.

Through researching, we don't think there is any way to recover or search for these "soft delete" documents through an API. However, in order to verify we are in compliance with this legal requirement, we want to understand if there is feasibly any way for these "soft delete" documents to be retrieved or read. Hoping the someone could provide additional clarification here. Thanks!

The full documents themselves are not accessible via any elasticsearch api - you would need to write some very low level code and have access to the data server to get near that.
While the full documents may be inaccessible individual terms from soft-deleted docs eg lastname:smith may remain in the inverted index. These could appear as autocomplete suggestions eg in Kibana which uses the termsenum api to inspect the indexed terms. There may still be a switch in Kibana to turn off use of that api.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.