You do not necessarily need to create separate indices per host and application. If you add fields into the log entries that define host and application, they can often share a single index, assuming the have the same retention period. Having a large number of very small shards can be very inefficient, so make sure you know the trade-offs before deciding on a sharing strategy.
Dear Christian, i am not sure that logs have info about host and application. So, my first think is to create indexes for every application host and application on it. I have at least 30-40 application hosts with approximately 5-6 applications on them.
If you have the information available so that you can separate them into different indices, you can surely add fields to the log entries based on this information instead? With log data you generally create time-based indices, and if you are going to have separate indices per time period for all those applications and hoists, I would expect you to start suffering from having too many shards quite quickly.
As i understand index is like database, mapping is like schema, document is like a row and fields is like a columns and type is like table.
So, if i have a lot of application logs, i think i should create one index named application_logs, then create one schema named logmaster, then i should create one type named logs.
Three questions:
1.How do above-mentioned in correct manner?
2.How prevent elasticsearch create another one indexes and types? I need elasticsearch to create every log as separate document.
How provide it correctly?
3.Can elasticsearch manage so large type?
4.How provide creation of ids for every document which is understandable for user?
I have heard this analogy before, and it can be misleading and lead to wrong decisions. Elasticsearch is not a relational system, so don't make decisions based on how you would do this in a relational database.
Fields in an index need to have a defined mapping across the entire index, and it is no longer possible to have it differ depending on type. Therefore place logs that are similar in structure together in indices. This means that indices are more likely to be grouped by application types rather than hosts. make sure that you have reasonably large shards as having lots of small ones is inefficient. Use time based indices and consider going for monthly indices if retention period is long and/or the data volume is low.
Elasticsearch will create indices as data is indexed into them, so this is typically managed in Filebeat and/or Logstash.
I am not sure I understand what you mean here?
Elasticsearch will automatically assign an ID if none is given. Why do they need to be understandable for the user?
A common pattern for a centralised logging solution is to gather logs using Beats and then enrich and parse it in Logstash and/or the new Elasticsearch ingest node feature. These log r records are then typically indexed into Elasticsearch, where time-based indices generally are used. Data that is similar and have the same retention period is often kept in the same index, so you may end up with a time-based index per application. Each time-based index is usually associated with an index template. Types can be used sparingly, but in most cases you can just have a single type.
In almost all cases where data is immutable and not updated, Elasticsearch is left to assign the document ID. As Elasticsearch is a search engine, it is easy to search based on the content of the logs, and I never see the key used for search.
Thank you very much for your advises! Great! Brilliant! Your advises is exactly i am looking for.
So, last portion of questions inline.
A common pattern for a centralised logging solution is to gather logs using Beats and then enrich and parse it in Logstash and/or the new Elasticsearch ingest node feature.
ER: Fine, i'll will use Beats or Logstash shippers itself.
These log r records are then typically indexed into Elasticsearch, where time-based indices generally are used.
ER: Great. How i can configure Elastcsearch to create only time-based indices properly? Where i can config it? For example, i have a lot of application logs every day. ES will create indices for every day? If yes where i can config shippers or forwarders to put log into appropriate index?
Data that is similar and have the same retention period is often kept in the same index, so you may end up with a time-based index per application. Each time-based index is usually associated with an index template. Types can be used sparingly, but in most cases you can just have a single type.
ER: Great and acceptable. What is standard retention for logs?
In almost all cases where data is immutable and not updated, Elasticsearch is left to assign the document ID. As Elasticsearch is a search engine, it is easy to search based on the content of the logs, and I never see the key used for search.
This is configured in Logstash through the Elasticsearch output plugin. You can there use conditionals as well as date pattern in the name of indices to send events to the appropriate index.
There is no standard - it all depends on what your requirements are. I have seen users keeping logs from just a few days up to several years.
Christian, where i can get info regarding usage of conditionals in logstash output config? Could you please, provide example of such config? What can be configured through Logstash output config?
Here is a guide to getting started with Logstash. Once you understand how Logstash works, there are quite a few examples to look at, but you should be able to find a lot of examples by searching this forum as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.