So, I have recently started playing with the Elastic Stack and have seen some interesting use cases, like ingesting pfSense logs, etc,.
While looking into some logstash and ES configurations I have noticed some interesting patterns, like:
Logstash/Elasticsearch Specific questions:
Some people use numbers at the begining of Logstash configuration file names. Is this to determine loading priority by Logstash, or is it just random choice of the admin?
Some stages of the event pipelines, like filters, occur in more than one file in the logstash/config.d/ directory. Does Logstash loads all those individual files in memory and assuming there is no syntax errrors, applies the applicable ones once there is a match? How does that work exactly?
Can you configure the output phase to create custom indexe names? And if so, does that affect the template which will be attached to the index on Elasticsearch resulting in mapping errors and wrong data types?
What are the best practices when ingesting data from multiple sources? Is it advised on those instaces to have their own separate index created on the output section of the config so they are isolated when they are in Elasticsearch? Are multiple pipelines the best option?
Elasticsearch/Kibana Specific questions:
How exactly does the index name relate to an Elasticsearch template? Can this be chosen manually? How? How does Elasticsearch choses the template to use?
Can corrections to mapping be made/appended to the used mapping post indexing? How?
Logstash now supports multiple pipeline, some of the naming conventions are carryovers from before. Configs were split into multiple files probably to make editing easier or to reuse parts elsewhere. Logstash basically does
cat <config_path/* >> config.yml
All the config files are concatenated in sort order to create the total config.
Logstash output can create custom names, the samples often show variable substitution for beat name and version. You can add your own variables. We add fields in beats for campus and application. We often have logic in the filter for specific applications but then can create separate indices for each combination. Note: This pretty makes it impossible for logstash to manage ILM, we have to bootstrap each ILM index before indexing to a new pattern combo.
There isn't really one best practice, you have to prioritize the rules for your needs. There are some critical issues for an elastic stack, too many shards, shards too big etc. I would recommend ILM rollover to control shard size. We combine data with common mapping IF security needs don't require separate indices. (We don't license field level security)
We have under 10 major pipelines but have 20 - 30 "active" indices, some pipelines have dedicated indices, others use the variable substitution above.
When an index is applied, all templates where "index_pattern" matches are applied in "template priority" order. So template with pattern "filebeat-xxx" priority 1 would apply before template with pattern "boston" priority 100 for a new index called "filebeat-7.6.2-boston"
You can't change mapping for existing fields. See the reindex api.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.