Best configuration for analysing multiple servers

I'm hoping that someone can provide some advice or experience. So far I have a proof of concept working, analysing a single web server.
In production we have multiple servers (web, application, DB's) as well as multiple OS's (linux, centos, Windows) that I want to analyse. At the moment it is only my team looking at the data as an debug tool, but at a later stage I might also want to restrict/grant access to certain dashboards in kabana to other departments. What would be the best way of setting up an ELK stack for this? So far my ideas are:

  1. one index for the lot and try to get logstash to map to the same terms where possible
  2. An index for each server (not quite sure how to do this)
  3. Some other method I've not thought of.

Occasionally, I'd like to compare certain stats between servers (to trace the actions of a singe IP through the system, or to compare loads between web frontends, app servers and DBs), but I'm not sure that this is possible if I use separate indexes?

I'm after the pros/cons of each and thoughts on if I'm on the right track or completely missing the point of how to set this up at scale.

thanks for your input.

You have quite a few objectives, and to keep things simple to start and explain the architecture I have which is +90 servers and 300GB a day of logs, of course there are many designs and this one was my first attempt so there maybe good and bad idea's in here

Sorting out and making indexes

First thing to think about is your indexes. The bigger they are the more brute force you need when searching.

So I tend to use the "type" field to sort out my data but feel free to get creative for your needs.

input {
  file {
    type => "apache-Access-log"
    path => ["/var/log/http/access_log"]
}
output {
  elasticsearch { 
    host => localhost 
    index => "%{type}-%{+YYYY.MM.dd}"
  }
	 # stdout { codec=>"json" }
}

Architecture

Some ideas on design

  • Logstash Forwarder -> Queue -> parsing indexer -> Elasticsearch
  • Logstash Parser and Indexer -> Elasticsearch

Or what I have

Logstash and Parsing -> queue -> (Sub-parser ) & Indexer -> Elasticsearch

Explanation of my architecture

I do the bulk of my indexing on every server. Like Merging of multiple lines, parsing the data, and tagging the lines with important information. Like "web_acces_logs", "Prod" , etc . (90 servers parsing is better then a couple IMHO)

The Queue I run is Redis, but I think I will be going to Apache Kafka , but select the one you like the most.

The Indexer, is where I sort out the data. Determine which "Index" I want to place it in, throw away data, send information to Nagios, as well as some generic parsing I might want.
Note: This indexer has to be able to process every log, you may need multiples which is why there is a queue.

Elasticsearch:

here I do a couple of things:

  • Indexes are daily

  • I Create Mapping's with aliases , an alias of "loadavg" is easier then "loadavg-2015.06.21, loadavg-2015.06.22" or "loadavg*" which could bring in other folders I might not want.

  • I use this in my crontab to clean up older data after all we only have so much diskspace. You might find other ways like snapshots or exporting the data
    https://github.com/elastic/curator

What do you use as the indexer?

So you have multiple indexes in ES? Do you ever need to do queries across multiple indexes in Kibana?

Logstash is what I use for an Indexer, I have 2 to 4 running for redundancy also I tend to add Added Heap space and added threads "-w " cli option to have more threads to process the data (that might have been over kill)

I don't normally query across indexes but that is what aliases would help you with. Or you can always query like Host:9200/index1,index2/_search

Thanks for your ideas. You've given me a path to explore down, so I don't feel totally lost :smile:

When you say aliases, do you mean ES aliases : https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

Glad to give you a few bread crumbs, of course the above is my solution based on discovery you milage may vary.

Yes ES Aliases is what I was talking about for CROSS Index searching, also you can