Enterprise Log collection Architecture with Elastic

eamonn · November 14, 2017, 5:34pm

I am tasked with designing a log collection architecture which involves collecting application and infrastructure logs from a number of Linux and windows severs and databases which are virtualization clustered and load balanced and separated from the tooling environment firewalls.
I need to understand if I need a local log collector in the remote environments.
how this would work with NSX, ESX, JBOSS.
it was suggested to me to use the following configuratuion
logstash > rabbitMQ > |firwwall|>logstash> elasticsearch
alternatively in the cluster of 9 RHEL servers could/should i deploy a syslog server.
Aslo should i be considerinf beats instead of Logstash

Any advice would be very welcome

magnusbaeck · November 14, 2017, 7:46pm

I need to understand if I need a local log collector in the remote environments.

That's recommended and often the only option.

it was suggested to me to use the following configuratuion
logstash > rabbitMQ > |firwwall|>logstash> elasticsearch

That's a good option. So the second Logstash instance can connect to the RabbitMQ instance on the other side of the firewall?

Aslo should i be considerinf beats instead of Logstash

For collecting text logs I'd recommend Filebeat because of its lower overhead (disk, RAM, and probably CPU too).

eamonn · November 17, 2017, 4:30pm

I am wondering the following...

what is the addition of Rabbit MQ to this architecture giving us that Logstash cannot do on its own...

also

Do i need need to cluster and/or load balance log-stash for resilience
also the same for elastic search and Kibana.

magnusbaeck · November 20, 2017, 6:48am

what is the addition of Rabbit MQ to this architecture giving us that Logstash cannot do on its own...

If you place a RabbitMQ broker outside the firewall it becomes a common service that hosts both inside and outside the firewall can connect to. I'd prefer that over having hosts outside the firewall connect to Logstash inside the firewall or have Logstash outside the firewall and connecting to Elasticsearch on the inside.

It also makes it trivial to run multiple Logstash instances that feed from the same queue, increasing fault tolerance and load balancing.

Do i need need to cluster and/or load balance log-stash for resilience

Logstash can't be clustered per se but you can run multiple instances that process events.

also the same for Elasticsearch and Kibana.

Kibana can't be clustered. Whether you need to cluster ES depends on how much data you need to process and what level of fault tolerance you need.

eamonn · November 20, 2017, 11:02am

Thank you very much for your assistance so far. It is much appreciated.

Am i correct in saying that the beat file will do the load balancing once they are aware of the Log stash instances.

magnusbaeck · November 20, 2017, 12:47pm

Am i correct in saying that the beat file will do the load balancing once they are aware of the Log stash instances.

Yes, the Beats programs can distribute events between multiple Logstash servers, creating a somewhat crude form of load balancing.

eamonn · November 20, 2017, 1:24pm

Hi Magnus,
The Log collection system will be collecting application and o/s logs from WIN2012 and RHEL VMs as well as several network devices.
I am advised to use Filebeats for the RHEL o/s and application logs and I am wondering

if this is a good idea
or should i have two agents one for o/s one for application or would this double the overhead for no benefit.
also is there is a Filebeats template for standard RHEL log file input
Also there is an Exadata appliance to be monitored this will send linux o/s and DB audit logs. It is suggested by exadata to use syslogs to send the o/s files to the collector. Is this ok?

Rgds

Eamonn

magnusbaeck · November 20, 2017, 2:36pm

if this is a good idea

Sure.

or should i have two agents one for o/s one for application or would this double the overhead for no benefit.

I don't think there are any runtime benefits of running two Filebeat instances, but deployment-wise you might want to deploy OS-specific configurations as part of the machine provisioning while configuration of application log collection is owned by the application deployment scripts.

also is there is a Filebeats template for standard RHEL log file input

What do you mean by template?

Also there is an Exadata appliance to be monitored this will send linux o/s and DB audit logs. It is suggested by exadata to use syslogs to send the o/s files to the collector. Is this ok?

And by collector you mean Logstash? Sure, that's fine.

eamonn · November 20, 2017, 3:11pm

By template i mean RHEL OS-specific configuration templates for Filebeats.

should i be looking at Auditbeat as well?

also

do you know how i collect logs from IBM SAN controller, SAN Directors, San storage arrays i am finding it hard to get information on this.

magnusbaeck · November 20, 2017, 3:32pm

By template i mean RHEL OS-specific configuration templates for Filebeats.

Nothing I know anything about.

should i be looking at Auditbeat as well?

If you want to collect that kind of information, sure. I can't answer for your needs.

do you know how i collect logs from IBM SAN controller, SAN Directors, San storage arrays i am finding it hard to get information on this.

Most appliances support syslog, other than that I don't know.

eamonn · November 20, 2017, 4:38pm

Hi Magnus

In the configuration suggested above

Filebeat > logstash > rabbitMQ > |firewall| > logstash> elasticsearch

can you explain what the purpose of the first 'local' instance of Logstash is. Why is this needed and what value do you see it adding?

magnusbaeck · November 20, 2017, 8:50pm

It's quite simple: Filebeat can't send to RabbitMQ on its own. If you choose Kafka or Redis instead of RabbitMQ you don't need that Logstash instance since Filebeat has native support.

eamonn · November 21, 2017, 9:41am

Is this true for Winlogbeat and syslogs also

is there any particular advantage to doing the file filtering in the local Logstash instance before transit to the remote Logstash/elasticsearch platform.

magnusbaeck · November 21, 2017, 10:10am

Is this true for Winlogbeat and syslogs also

WInlogbeat and Filebeat behave the same in this regard. Syslog daemons typically only send over the syslog protocol (but you could configure Filebeat to read the logs from disk). Logstash can capture syslog messages sent over the network.

is there any particular advantage to doing the file filtering in the local Logstash instance before transit to the remote Logstash/elasticsearch platform.

Generally I'd say no.

eamonn · December 10, 2017, 2:12pm

Hi Magnus

I want to send syslog logfiles securely from my VMware servers through my management zone firewall to Logstash.
I need to understand how this could work

do i need to use syslog-ng, or can logstash integrate securely with syslog?
can logstash take an input from syslog-ng.
will i need to deploy syslog (or syslog-ng servers)
will i be able to use mutual certificate authentication?

magnusbaeck · December 11, 2017, 6:31am

do i need to use syslog-ng, or can logstash integrate securely with syslog?

The syslog daemons I know about are named rsyslog and syslog-ng. Both will work with Logstash but authentication might be tricky to set up.

can logstash take an input from syslog-ng.

Yes, e.g. with a tcp, udp, or syslog input.

will i need to deploy syslog (or syslog-ng servers)

Not sure what you mean. You obviously have to run a syslog daemon on the servers containing the logs, but I'm sure you already do that.

will i be able to use mutual certificate authentication?

I believe Logstash's tcp input supports SSL peer certificate verification but I've never tried it. I don't know if rsyslog or syslog-ng supports it.

eamonn · December 11, 2017, 5:27pm

As part of the logging infrastructure we will be using RabbitMQ
as so....
Filebeat > logstash > rabbitMQ > |firewall| > logstash> elasticsearch

the following questions have been raised

What protections will be put in place to ensure that only authorized ‘processes’ access queues? And that they ONLY access their OWN queues
what controls will be put in place to prevent unauthorized message disclosure / deletion / duplication or relay

magnusbaeck · December 11, 2017, 7:46pm

These are really RabbitMQ questions but I'll answer them quickly.

What protections will be put in place to ensure that only authorized ‘processes’ access queues? And that they ONLY access their OWN queues
what controls will be put in place to prevent unauthorized message disclosure / deletion / duplication or relay

All AMQP clients need to authenticate. RabbitMQ supports fairly fine-grained access control over what you're allowed to set up queues against (i.e. what kind of information you can access) and which queues you're allowed to consume from. You could e.g. have a policy where anyone can set up a queue to subscribe to any messages but you can only consume from your own queues.

eamonn · December 12, 2017, 12:25pm

removed

magnusbaeck · December 12, 2017, 1:13pm

I try to answer the concrete questions people have but I don't do their homework.

Topic		Replies	Views
Multiple Logstash instances vs Filebeats - Architecture Logstash	2	888	August 22, 2017
Basic Architecture Question Beats filebeat	4	339	August 31, 2019
How to collect logs from "outside" Hosts? Logstash	5	527	October 8, 2018
Using RabbitMq as broker between Beats and Logstash Beats	28	501	June 16, 2024
Kibana, elasticsearch and filebeat. Do i need logstash? Elasticsearch	4	2031	January 20, 2018

Enterprise Log collection Architecture with Elastic

Related topics