Just finished my first ELK deployment. On a HP EliteDesk i5, 8 GB, 120 GB SSD for $90.
It was done with CentOS 8 Minimal, Docker, Docker Compose, Cockpit, Portainer and Deviantony/Docker-ELK. A few tweaks, adding Beats and HTTPS. Figured out the defaulted Java heap of 256 MB wasn’t enough, raised to 6 GB.
Now, I have roughly 5 events/second. 200 fields, 5 indexes.
Where can I read on index best practices? The dashboards must be able to easily show last weeks events. Older events is ok to be slower.
Can you give me some tips? I’m scared that my build will crash in one week with default config.
Hi,
I guess the "best practices" depend on your use case. For example if you are indexing sequential log data/events you should have a closer look at Data streams. When frequently updating/deleting documents you should use "normal" indices.
No matter what you use I suggest you have a look at Index Lifecycle Policies - you have several options there to keep the data on your cluster nice and tidy. If you have a multi node deployment you can also reduce the shard count after rolling over the indices - this can save you some disc space.
Well if you only store log data I would suggest to use Data streams. They also make the Index Lifecycle Management (ILM) easier in my opinion (Data streams are backed by "hidden indices").
There are multiple reasons why you should use ILM:
automatically delete logs which are too old (save space)
automatically manage index and shard sizes (keep the search and index performance of your cluster as high as possible)
automatically spread your data across multiple indices (increase search queries)
About shard sizes and shard count here is a good blog post which you should have a look at.
If you have a single node cluster I guess using only primary shards would be the best seeing you won't be able to set up your cluster to survive a node failure.
In general you should have a look at multi node deployments seeing they are one of the biggest advantages Elasticsearch has (if your log data is important)
Another thing I find very useful when indexing log data is the use of Ingest Pipelines (your node must be configured to be an Ingest node to use this feature). It enables you to do transformation and enrichment of your incoming data.
The Elastic cosmos is quite large, so yes, there sure is a lot to learn
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.