I am currently working with Elasticsearch to store our log files. I have followed all the necessary steps (Filebeat -> Logstash -> Elasticsearch) and I am ready to deploy the system for testing before deploying it on a larger scale.
I have a few questions regarding Elasticsearch and Logstash:
I am planning to use the index pattern {testlevel}-{casename}-{start time} to map a log file. This means that there will be thousands of indexes once the mapping pattern is deployed. I am concerned whether having this many indexes could cause performance issues on the Elasticsearch server. What are the best practices for efficient and effective log storage and retrieval, and how can I avoid any potential issues with a large number of indexes?
In the Logstash server, I have created a configuration file that contains code for managing messages. I am wondering if including too many filter plugins in this configuration will have a negative impact on the efficiency of the input and output. Should I consider limiting the content in this block to optimize performance?
That sounds like a very bad idea. Having large volumes of small indices and shards will cause performance and stability problems and not scale well. I would instead recommend setting up a data stream or set of time-based indices and store all data in this. When doing this you ensure you have all the information you were going to put in the index name as indexed fields so you can search and filter on it.
When doing this you ensure you have all the information you were going to put in the index name as indexed fields so you can search and filter on it.
~~
Can I understand this like: I have an index called 2023-3-17(time-based indices), and stores all the logs generated on 2023-3-17 into this index and put some tags(indexed fields) for filter?
So
time-based indices = index.
indexed fields = tags = fields in the document of index?
~~
Ok, forgive me. I don't know that Both "indexes" and "indices" are acceptable plural forms of the word "index".
Create either a data stream or a set of time-based indices.
A datastream will create a series of backing indices each containing data indexed during a specific period. You index into an alias that writes to the latest index and can query all indices.
Another way is to have Logstash create time-based indices. You can do this by specifying the index name as follows:
elasticsearch {
index => "logstash-%{+YYYY.MM.dd}"
}
Irrespective of the method chosen, you then make sure in Logstash that you store the data you want to be able to filter on in the actual events, e.g.:
Thanks for sharing these methods, I probably would use the time-based indices.
In the Logstash server, I have created a configuration file that contains code for managing messages. I am wondering if including too many filter plugins configuration in this configuration will have a negative impact on the efficiency of the input and output. Should I consider limiting the content in this block to optimize performance?
Do you have any suggestions for this one?
To be more specific, I need to add a lot of flitter configuration in the logstash config file to retrieve the data I need as the tags(test level, case name, case parms, etc.) to filter in the index. I wonder if this will cause any problems
Logstash is designed for complex data transformation, so having a large number of filters is not necessarily going to be a problem or dramatically affect performance. If you write very inefficient filters your Logstash performance can suffer though. I have seen a fair few Logstash users over the years that have made very inefficient use of the grok filter plugin, so that is something to be aware of.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.