Best Practices for Efficient and Effective Log Storage and Retrieval with Elasticsearch and Logstash?

LongKang_Fan · March 17, 2023, 8:07am

Hello,

I am currently working with Elasticsearch to store our log files. I have followed all the necessary steps (Filebeat -> Logstash -> Elasticsearch) and I am ready to deploy the system for testing before deploying it on a larger scale.

I have a few questions regarding Elasticsearch and Logstash:

I am planning to use the index pattern {testlevel}-{casename}-{start time} to map a log file. This means that there will be thousands of indexes once the mapping pattern is deployed. I am concerned whether having this many indexes could cause performance issues on the Elasticsearch server. What are the best practices for efficient and effective log storage and retrieval, and how can I avoid any potential issues with a large number of indexes?
In the Logstash server, I have created a configuration file that contains code for managing messages. I am wondering if including too many filter plugins in this configuration will have a negative impact on the efficiency of the input and output. Should I consider limiting the content in this block to optimize performance?

filter
{
  mutate { add_field => { "logpath" => "%{[log][file][path]}"  }  }

  mutate { split => { "logpath" => "/" } }

  mutate { add_field => { "logname" => "%{[logpath][-1]}"  }  }

  mutate { split => {logname => "_"} }

  mutate { add_field => {"casename" => "%{[logname][1]}"  }  }

  ......
  ......
}

Christian_Dahlqvist · March 17, 2023, 8:11am

That sounds like a very bad idea. Having large volumes of small indices and shards will cause performance and stability problems and not scale well. I would instead recommend setting up a data stream or set of time-based indices and store all data in this. When doing this you ensure you have all the information you were going to put in the index name as indexed fields so you can search and filter on it.

LongKang_Fan · March 17, 2023, 8:27am

Hi @Christian_Dahlqvist

When doing this you ensure you have all the information you were going to put in the index name as indexed fields so you can search and filter on it.

~~
Can I understand this like: I have an index called 2023-3-17(time-based indices), and stores all the logs generated on 2023-3-17 into this index and put some tags(indexed fields) for filter?

So
time-based indices = index.
indexed fields = tags = fields in the document of index?
~~

Ok, forgive me. I don't know that Both "indexes" and "indices" are acceptable plural forms of the word "index".

Christian_Dahlqvist · March 17, 2023, 9:02am

Create either a data stream or a set of time-based indices.

A datastream will create a series of backing indices each containing data indexed during a specific period. You index into an alias that writes to the latest index and can query all indices.

Another way is to have Logstash create time-based indices. You can do this by specifying the index name as follows:

elasticsearch {
  index => "logstash-%{+YYYY.MM.dd}"
}

Irrespective of the method chosen, you then make sure in Logstash that you store the data you want to be able to filter on in the actual events, e.g.:

{
  "testlevel": "x",
  "casename": "y",
  "start_time": "2023-03-17T03:17:00Z",
  ...
}

LongKang_Fan · March 17, 2023, 9:18am

Hi @Christian_Dahlqvist

Thanks for sharing these methods, I probably would use the time-based indices.

In the Logstash server, I have created a configuration file that contains code for managing messages. I am wondering if including too many filter plugins configuration in this configuration will have a negative impact on the efficiency of the input and output. Should I consider limiting the content in this block to optimize performance?

Do you have any suggestions for this one?

To be more specific, I need to add a lot of flitter configuration in the logstash config file to retrieve the data I need as the tags(test level, case name, case parms, etc.) to filter in the index. I wonder if this will cause any problems

Christian_Dahlqvist · March 17, 2023, 9:26am

Logstash is designed for complex data transformation, so having a large number of filters is not necessarily going to be a problem or dramatically affect performance. If you write very inefficient filters your Logstash performance can suffer though. I have seen a fair few Logstash users over the years that have made very inefficient use of the grok filter plugin, so that is something to be aware of.

LongKang_Fan · March 17, 2023, 9:28am

Thanks @Christian_Dahlqvist . This is helpful to me!

system · April 14, 2023, 9:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practices for indexing log data Logstash	6	19698	October 25, 2017
Best practices for managing application logs Elasticsearch	2	1239	February 22, 2021
Best practice about index pattern definition Kibana	2	1366	December 23, 2016
Optimizing process time Logstash	2	296	March 13, 2020
Best practice in elasticsearch idnex Elasticsearch	6	277	September 15, 2021

Best Practices for Efficient and Effective Log Storage and Retrieval with Elasticsearch and Logstash?

Related topics