Given a stream of data coming in every second, but we only want to keep data in the past T time (say 1 hour). What is the best way to expire and remove old data? We did some research and found the following two
Set ttl of each document to T, and ES will automatically black list old data and remove them. One question we have is when and how frequently will the data be physically removed? Is it controlled by indices.ttl.interval or something else?
Use time-frame based indexes, and index data every T time frame. However, this approach might introduce very strange tfidf scores for the latest index when it has very few data. Is there a good way to handle this?