Automatic deleting the documents from indices

Hi there!! Since I have started using ELK it make me more curious hence I keep on exploring and hence the doubts also increases rapidly. The biggest challenge in analysis is the size, Now I have an question is there anyway through which older documents from indices automatically gets deleted. As I have managed to discover the mechanism where the new documents are automatically added to my indices hence automatically deletion of documents from indices would be very helpful.

What sort of use case are you using the stack for?

Currently I am using it for Logging and Log Analysis, Full text search along with visualizing.

Are you using time based indices?

Yes I am using time base indices

Then just delete the index instead of the document.

Take a look at ILM as well, it might be better.

I am using pipelines.yml so that whenever I have an input it will dynamically updates those file in an index. But space is the real issues because we can't have unlimited space and you know these logs size they are growing rapidly eating all the space. So I thought what if I can automatically delete the old documents from my index this way no one has to be worried about the space. Previously I was checking the curator but it will delete the whole index... Then I tried the delete api where certain documents can be deleted but manually..

For time series data, you should use rollover indices, and delete the entire index, still. ILM is a great way to do this automatically based on index/shard size constraints. If you still need to delete indices according to space constraints, then you could still fall back and use Curator instead of ILM, but you'd be responsible to have Curator run periodically, using cron or something similar.

Hence there is no way through which I can periodically delete the old documents from time based indices??

The whole point of having time-based indices is that you can delete complete indices which is much more efficient than deleting documents individually from an index.

But if we delete the index our visualization will also be deleted.

An index pattern typically matches a set of time based indices so deleting the oldest index should be the same as deleting the data and visualisations should not be affected.

Naah actually I deleted an index and all the visualizations related to that index were deleted too.

All Kibana visualisations are kept in a separate index so that should not be the case. Can you please describe your setup and exactly what you did?

OK...few more questions

  1. I am importing logs via logstash using pipelines.yml. Once my index is created after that I can dynamically update my index (I ain't closing the logstash) just by uploading the logs in specified location.
    Now once my index is deleted and afterwards if i add some new logs in that location will it automatically create a new one?

What does your Logstash config look like? How are you creating indices? Time-based indices are typically used with immutable data which makes me wonder if you really are using this.

Shall I paste pipelines.yml? or you need to see the logstash.conf file?

I am interseted in the Elasticsearch output section of the conf files.

This is apache.conf file:
input {
file {
path => "/elk/Weblog/*"
start_position => "beginning"
}
}
filter
{
grok
{ match => { "message" => "%{COMBINEDAPACHELOG}"} }
geoip { source => "clientip" }
}
output
{
elasticsearch {
hosts => ["10.11.109.7:9200"]
index => "logstash_apchelogs"
}
}

If you are specifying a single index name you are not using time-based indices, which is what Logstash creates by default, e.g. logstash-2019-05-09. In that case deleting the index will delete all data. In such scenarion you probably need to use delete-by-query to delete documents, but be aware this is much more expensove than deleting a complete time-based index.