Looking for an advice on how to get rid of all deleted documents in 4TB indice on a monthly basis. This indice is not a time series and it has lots deletes and writes all day long.
These are the number of docs from _stats
"primaries" : {
  "docs" : {
    "count" : 4377351691,
    "deleted" : 1276015486
  },
"total" : {
  "docs" : {
    "count" : 8754708022,
    "deleted" : 2565553924
  },
When I look at the _segments output I can see some segments with 40% data deletes, here is a sample
https://gist.githubusercontent.com/ofrivera/a16d67cfa4e3b59c21db1a2b4e8615c7/raw/754e1d46ab468e448dbe40e3b0a3d6208f1706b0/gistfile1.txt
Some options I'm considering:
- Create a copy of the indice (via snapshot/restore), convert to read only, expunge deletes, sync deltas.
 - _forcemerge but what options you suggest?, specially to prevent ending up with huge segments.
 - Or just trying to be more aggressive with index.merge.scheduler.max_thread_count any suggestion?
 
Some details about cluster setup:
Specs
AWS i3.4xlarge
16 nodes (all data nodes)
120gb RAM memory
16cpus
Health
{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 16,
  "number_of_data_nodes" : 16,
  "active_primary_shards" : 629,
  "active_shards" : 1258,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
OS
CentOS Linux release 7.5.1804 (Core)
3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Nov 30 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
RPM:
elasticsearch-6.4.2.rpm
Details:
elasticsearch-6.4.2-1.noarch
JAVA:
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
Thanks!