Best configuration for analysing multiple servers

eperry · June 21, 2015, 4:35pm

You have quite a few objectives, and to keep things simple to start and explain the architecture I have which is +90 servers and 300GB a day of logs, of course there are many designs and this one was my first attempt so there maybe good and bad idea's in here

Sorting out and making indexes

First thing to think about is your indexes. The bigger they are the more brute force you need when searching.

So I tend to use the "type" field to sort out my data but feel free to get creative for your needs.

input {
  file {
    type => "apache-Access-log"
    path => ["/var/log/http/access_log"]
}
output {
  elasticsearch { 
    host => localhost 
    index => "%{type}-%{+YYYY.MM.dd}"
  }
	 # stdout { codec=>"json" }
}

Architecture

Some ideas on design

Logstash Forwarder -> Queue -> parsing indexer -> Elasticsearch
Logstash Parser and Indexer -> Elasticsearch

Or what I have

Logstash and Parsing -> queue -> (Sub-parser ) & Indexer -> Elasticsearch

Explanation of my architecture

I do the bulk of my indexing on every server. Like Merging of multiple lines, parsing the data, and tagging the lines with important information. Like "web_acces_logs", "Prod" , etc . (90 servers parsing is better then a couple IMHO)

The Queue I run is Redis, but I think I will be going to Apache Kafka , but select the one you like the most.

The Indexer, is where I sort out the data. Determine which "Index" I want to place it in, throw away data, send information to Nagios, as well as some generic parsing I might want.
Note: This indexer has to be able to process every log, you may need multiples which is why there is a queue.

Elasticsearch:

here I do a couple of things:

Indexes are daily
I Create Mapping's with aliases , an alias of "loadavg" is easier then "loadavg-2015.06.21, loadavg-2015.06.22" or "loadavg*" which could bring in other folders I might not want.
I use this in my crontab to clean up older data after all we only have so much diskspace. You might find other ways like snapshots or exporting the data
https://github.com/elastic/curator

Topic		Replies	Views
Cost effective method for configuring, storing and analyzing logs Elasticsearch	3	618	December 7, 2017
How to configure different logs from different servers Logstash	5	2296	August 21, 2018
Asking for Filebeat and Logstash configuration design advice Elasticsearch	3	582	December 29, 2017
Single index or multiple index Elasticsearch	4	614	September 20, 2021
Elastic Stack Setup for Multiple Server Elasticsearch	10	2928	July 23, 2018

Best configuration for analysing multiple servers

Sorting out and making indexes

Architecture

Explanation of my architecture

Related topics