Deleting documents by date

It's a lot like comparing a sql DELETE from TABLE where date<YYYY.MM.dd to a sql DROP TABLE. One has potentially millions or billions of rows of atomic operations, where the other is a single operation. Which is more performant?

In terms of operational efficiency, you end up needing a lot of extra read/write operations to actually delete things in a Lucene index. First, you have to find what to delete (read operation). Then you need to mark the documents for deletion (write operation). Then, at the next segment merge, Lucene has to find the documents marked for deletion (read operation), and make new segments without those documents (write operation).

But it's not merely operational efficiency, but segment fragmentation that will be your enemy over time, robbing you of valuable storage efficiency. See http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html for a brilliant view into what happens with segment deletes (which come with document deletes) and how it interrupts the normal, efficient, tiered segments of a lucene index.

This is why we don't recommend delete_by_query as a solution for time-series data.