Help me understand the use case for indices

Hello,

I'm new user of Elasticsearch, Kibana, Logstash and Filebeat, but have managed to install an environment and can send logs from my servers to the Logstash, which parses and forwards these to Elasticsearch.. So far so good..

I'm almost ready to put this setup into production - but am missing some information to ensure I've done everything the "right way".

We will be using the ElasticStack as centralized log server (For kind of backup but also for our operations team to daily access logs across all systems)

The part I am most in doubt of is the "Indices"? What is best practice here? Will one fit all or must I split them into...? Servers, Log types or ?

Right now I've been testing with a single one named logstash-*

We will be shipping a big varieties of log files to the ElasticStack (Rails logs, debian logs, java logs and also custom logs)

Please advice for some good resources and use cases for when to split into indices and when not to.

1 Like

Hi @mriber,

the concept of an index is explained in the documentation. Briefly said, it is a collection of documents and acts as the scope for mappings and many other document-related settings. Indices are subdivided into shards, that are distributed across the cluster for load balancing and resilience. By default, logstash creates an index for every day with the name being logstash-[YYYY.MM.DD], but this can be changed in the logstash configuration. Indices can be delete efficiently and therefore make for a good unit of log rotation, e.g. using Curator.

The logstash-* you mentioned is a wildcard pattern that tells Elasticsearch to query across all indices whose name starts with logstash-. That way, even though your log entries are separated into daily indices, you can still query across all of them.

So if you expect to frequently query for log entries within one day and one type of logs, it would make sense to create an index per day and type. You're also free to change that index partitioning scheme at any point in the future if certain usage patterns emerge.

Hi @weltenwort,

Thank you very much for your response.

So given that I will be using the elasticsearch for cross server, cross project and cross applications centralized logging (thus many different formats of logging) it would make sense for me to make a different index pattern...

logstash-%{project}-%{doc_type}-%{hostname}-[YYYY.MM]

This will give me the options for:

  1. Searching EVERYTHING (by adding an index pattern logstash-*)
  2. Search across certain project, this could be production-application, test-application and sysadmin(for all log files relating to sysadmin stuff) by adding index pattern like (logstash-production-app-*
  3. It will also give me better options for deleting data per host/application. If for example we are adding a new filetype to be logged.

Will this make sense in a elasticstack setup? Or is it "overcomplicating" things..

This approach will give you a very large number of small shards, which is very inefficient as each shard comes with some overhead in terms of file handles and memory usage. In order to make sure that you get the most from your cluster, make sure that you have reasonably large shards - an average shard size between a few GB and up to a few tens of GB is quite common.

So it will not be a problem to have "different" kind of data in same index?

Eg. I will be sending the deban auth.log, this contains information like hostname, pid and action - while I'm also sending a rails log which contains a lot more info (like clientip, response, parameters, request url and so on)

You can have different types of data in a shared index as long as you do not have conflicting mappings for common fields. It is generally recommended to keep similar data with the same retention period in the same index. Exactly what is considered similar however varies from case to case.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.