Indices or types when different types need different retention times?

wokmichel · March 9, 2017, 10:13am

Hello,

I need some advice on my design of indices / types in the following context :

I have set up a single (for, now, I will maybe add nodes later) ElasticSearch server for several purposes :

In Kibana, be able to search the logs of multiple applications gathered from multiple servers with FileBeat. Those logs are, for the most part, parsed into structured documents with LogStash
In Kibana, be able to produce vizualizations and dashboards from said structured data parsed with LogStash and Metricbeat for those applications.
In Kibana, produce Metrics dashboards from data gathered by MetricBeat

I have 3 Applications A, B and C with different kinds of logs for each. As a consequnce, I have "mapped" the log kinds to ElasticSearch types, for instance :

A has types A1, A2 and A3
B has types B1, B2 and B3
C has types C1, C2 and C3

Those types all have different mappings which chare very few fields (basically only the timestamp and filebeat default fields) :

I have chosen the following indices /types setup :

Application A logs go to daily rolling (via Logstash) indices called A-YYYY.mm.dd
Application B logs go to daily rolling (via Logstash) indices called B-YYYY.mm.dd
Application C logs go to daily rolling (via Logstash) indices called C-YYYY.mm.dd

I also have setup index aliases so that alias A points to A* indices, same for B and C, obviously.

The part where it gets tricky is that I want to keep some information longer than other :
Than means, for instance, that I want to delete all type A1 documents older than 7 days from A indices (metricbeat metricsets for instance) but I want to keep A3 type documents for at least 300 days for legal reasons.

I was thinking about using curator for this kind of maintainance but realized that curator was only acting on indices and does not go allow to go down to the document type level.
I then thought about using the delete by query API on the aliases but have read that it was a very bad practice.

My question is simple :
Is my setup suitable for what I want to achieve here or do I have to design all this differently so that types are spread accross their own indices even though it will create hundreds if not thousands of indices over a period of more that 1 year ?

Thanks a lot in advance for your help

Christian_Dahlqvist · March 9, 2017, 12:45pm

As deleting documents from an index is 'expensive', it is generally recommended that data that have different retention periods are placed into different indices so that retention period and data deletion can be handled by dropping entire indices. You also generally want to store logs that are similar structure (assuming same retention period) in the same index. This also means that you can have weekly or even monthly indices for data that arrives in lower volumes but need to be kept long, and thereby reduce the overall shard count.

wokmichel · March 9, 2017, 2:21pm

Thanks for your answer,
Does that mean that it would make more sense to create, for instance, daily indices for each type like so :

A-A1-YYYY.mm.dd
A-A2-YYYY.mm.dd
A-A3-YYYY.mm.dd
B-B1-YYYY.mm.dd
A-B2-YYYY.mm.dd
A-B3-YYYY.mm.dd

and so on ...
so that I can drop entire indices by just deleting indices A-A1 older that 7 days and keep A-A3 indices for 300 days ?

Thanks.

Christian_Dahlqvist · March 9, 2017, 2:31pm

It may, but you want to ensure that you do not end up with a lot of small indices/shards, so it is possible that data with a longer retention period should go into monthly indices while other types of data uses daily indices.Exactly which types of data should or should not share an index require knowledge of the data, so I would never recommend a standard pattern across all data as I have virtually no information about this.

wokmichel · March 9, 2017, 3:33pm

Of course I understand that when it comes to how the data is organized, well, it really depends on the data itself
Though, I gather from what you say that a general idea is to limit the shard number.
I have to add that, in my case, the indices only have one primary shard.

Anyway, thanks a lot for your help.

system · April 6, 2017, 3:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help me understand the use case for indices Kibana	6	1161	April 6, 2017
"Design" considerations about indices from different beats? Beats	2	440	May 10, 2017
Best practices for indexing log data Logstash	6	19771	October 25, 2017
Do not use lot of types per index? Elasticsearch	10	953	July 5, 2017
How to use the SAME mapping for all similar indices, and then for all types in the index? Elasticsearch	3	481	April 27, 2017

Indices or types when different types need different retention times?

Related topics