Hello, I'm still coming up to speed with ELK. My use case is to analyze AWS CloudFront logs in aggregates (such as total bandwidth for a folder in a bucket per month). These logs are about 5,000 KB compressed per day
Based on this article https://www.elastic.co/blog/index-vs-type it makes it sound like having many indices would not be ideal. However I see the default index is "logstash-%{+YYYY.MM.dd}" which creates an index for each day.
Could someone explain the reasoning behind having a daily index?
It makes retention management easier.
Historically, things like Kibana also used to use the index name to figure out which indices it needed to query to build visualisations.
Eventually you won't want to keep every single log record. Logs older than X number of days will have to be removed.
Elasticsearch doesn't delete data within an index very well though. It isn't super fast, and deleted records can hang around in the background taking up disk space. (Deleted docs are not removed until the segment is merged)
Deleting an entire index however is fast and does not create invisible records taking up valuable storage.
You do want to be smart about it though. If you have to keep 365 days worth of data and you only get 1000 records per day, it may make more sense to use a monthly or yearly index instead. That will cut down on the number of indices and segments your cluster has to manage.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.