Reasoning behind a daily logstash index?

chrisan · October 12, 2016, 12:36am

Hello, I'm still coming up to speed with ELK. My use case is to analyze AWS CloudFront logs in aggregates (such as total bandwidth for a folder in a bucket per month). These logs are about 5,000 KB compressed per day

Based on this article https://www.elastic.co/blog/index-vs-type it makes it sound like having many indices would not be ideal. However I see the default index is "logstash-%{+YYYY.MM.dd}" which creates an index for each day.

Could someone explain the reasoning behind having a daily index?

warkolm · October 12, 2016, 12:42am

It makes retention management easier.
Historically, things like Kibana also used to use the index name to figure out which indices it needed to query to build visualisations.

bhatch · October 20, 2016, 10:20pm

Eventually you won't want to keep every single log record. Logs older than X number of days will have to be removed.
Elasticsearch doesn't delete data within an index very well though. It isn't super fast, and deleted records can hang around in the background taking up disk space. (Deleted docs are not removed until the segment is merged)
Deleting an entire index however is fast and does not create invisible records taking up valuable storage.

You do want to be smart about it though. If you have to keep 365 days worth of data and you only get 1000 records per day, it may make more sense to use a monthly or yearly index instead. That will cut down on the number of indices and segments your cluster has to manage.

Topic		Replies	Views
Elastic search Indexes Elasticsearch	4	413	March 27, 2018
Clarification on index by day and automatically closing them Elasticsearch	3	467	December 16, 2016
Help me understand the use case for indices Kibana	6	1161	April 6, 2017
Best practice for indexing a daily log Logstash	2	1741	March 4, 2018
What is better. Monthly Indices or 1 Index with more shards? Elasticsearch	5	1132	October 17, 2020

Reasoning behind a daily logstash index?

Related topics