Enterprise Log collection Architecture with Elastic

I am tasked with designing a log collection architecture which involves collecting application and infrastructure logs from a number of Linux and windows severs and databases which are virtualization clustered and load balanced and separated from the tooling environment firewalls.
I need to understand if I need a local log collector in the remote environments.
how this would work with NSX, ESX, JBOSS.
it was suggested to me to use the following configuratuion
logstash > rabbitMQ > |firwwall|>logstash> elasticsearch
alternatively in the cluster of 9 RHEL servers could/should i deploy a syslog server.
Aslo should i be considerinf beats instead of Logstash

Any advice would be very welcome

I need to understand if I need a local log collector in the remote environments.

That's recommended and often the only option.

it was suggested to me to use the following configuratuion
logstash > rabbitMQ > |firwwall|>logstash> elasticsearch

That's a good option. So the second Logstash instance can connect to the RabbitMQ instance on the other side of the firewall?

Aslo should i be considerinf beats instead of Logstash

For collecting text logs I'd recommend Filebeat because of its lower overhead (disk, RAM, and probably CPU too).

I am wondering the following...

  1. what is the addition of Rabbit MQ to this architecture giving us that Logstash cannot do on its own...

also

  1. Do i need need to cluster and/or load balance log-stash for resilience

  2. also the same for elastic search and Kibana.

what is the addition of Rabbit MQ to this architecture giving us that Logstash cannot do on its own...

If you place a RabbitMQ broker outside the firewall it becomes a common service that hosts both inside and outside the firewall can connect to. I'd prefer that over having hosts outside the firewall connect to Logstash inside the firewall or have Logstash outside the firewall and connecting to Elasticsearch on the inside.

It also makes it trivial to run multiple Logstash instances that feed from the same queue, increasing fault tolerance and load balancing.

Do i need need to cluster and/or load balance log-stash for resilience

Logstash can't be clustered per se but you can run multiple instances that process events.

also the same for elastic search and Kibana.

Kibana can't be clustered. Whether you need to cluster ES depends on how much data you need to process and what level of fault tolerance you need.

Thank you very much for your assistance so far. It is much appreciated.

Am i correct in saying that the beat file will do the load balancing once they are aware of the Log stash instances.

Am i correct in saying that the beat file will do the load balancing once they are aware of the Log stash instances.

Yes, the Beats programs can distribute events between multiple Logstash servers, creating a somewhat crude form of load balancing.

Hi Magnus,
The Log collection system will be collecting application and o/s logs from WIN2012 and RHEL VMs as well as several network devices.
I am advised to use Filebeats for the RHEL o/s and application logs and I am wondering

  1. if this is a good idea
  2. or should i have two agents one for o/s one for application or would this double the overhead for no benefit.
  3. also is there is a Filebeats template for standard RHEL log file input
  4. Also there is an Exadata appliance to be monitored this will send linux o/s and DB audit logs. It is suggested by exadata to use syslogs to send the o/s files to the collector. Is this ok?

Rgds

Eamonn

if this is a good idea

Sure.

or should i have two agents one for o/s one for application or would this double the overhead for no benefit.

I don't think there are any runtime benefits of running two Filebeat instances, but deployment-wise you might want to deploy OS-specific configurations as part of the machine provisioning while configuration of application log collection is owned by the application deployment scripts.

also is there is a Filebeats template for standard RHEL log file input

What do you mean by template?

Also there is an Exadata appliance to be monitored this will send linux o/s and DB audit logs. It is suggested by exadata to use syslogs to send the o/s files to the collector. Is this ok?

And by collector you mean Logstash? Sure, that's fine.

By template i mean RHEL OS-specific configuration templates for Filebeats.

should i be looking at Auditbeat as well?

also

do you know how i collect logs from IBM SAN controller, SAN Directors, San storage arrays i am finding it hard to get information on this.

By template i mean RHEL OS-specific configuration templates for Filebeats.

Nothing I know anything about.

should i be looking at Auditbeat as well?

If you want to collect that kind of information, sure. I can't answer for your needs.

do you know how i collect logs from IBM SAN controller, SAN Directors, San storage arrays i am finding it hard to get information on this.

Most appliances support syslog, other than that I don't know.

Hi Magnus

In the configuration suggested above

Filebeat > logstash > rabbitMQ > |firewall| > logstash> elasticsearch

can you explain what the purpose of the first 'local' instance of Logstash is. Why is this needed and what value do you see it adding?

It's quite simple: Filebeat can't send to RabbitMQ on its own. If you choose Kafka or Redis instead of RabbitMQ you don't need that Logstash instance since Filebeat has native support.

Is this true for Winlogbeat and syslogs also

is there any particular advantage to doing the file filtering in the local Logstash instance before transit to the remote Logstash/elasticsearch platform.

Is this true for Winlogbeat and syslogs also

WInlogbeat and Filebeat behave the same in this regard. Syslog daemons typically only send over the syslog protocol (but you could configure Filebeat to read the logs from disk). Logstash can capture syslog messages sent over the network.

is there any particular advantage to doing the file filtering in the local Logstash instance before transit to the remote Logstash/elasticsearch platform.

Generally I'd say no.

Hi Magnus

I want to send syslog logfiles securely from my VMware servers through my management zone firewall to Logstash.
I need to understand how this could work

  • do i need to use syslog-ng, or can logstash integrate securely with syslog?
  • can logstash take an input from syslog-ng.
  • will i need to deploy syslog (or syslog-ng servers)
  • will i be able to use mutual certificate authentication?

do i need to use syslog-ng, or can logstash integrate securely with syslog?

The syslog daemons I know about are named rsyslog and syslog-ng. Both will work with Logstash but authentication might be tricky to set up.

can logstash take an input from syslog-ng.

Yes, e.g. with a tcp, udp, or syslog input.

will i need to deploy syslog (or syslog-ng servers)

Not sure what you mean. You obviously have to run a syslog daemon on the servers containing the logs, but I'm sure you already do that.

will i be able to use mutual certificate authentication?

I believe Logstash's tcp input supports SSL peer certificate verification but I've never tried it. I don't know if rsyslog or syslog-ng supports it.

As part of the logging infrastructure we will be using RabbitMQ
as so....
Filebeat > logstash > rabbitMQ > |firewall| > logstash> elasticsearch

the following questions have been raised

  1. What protections will be put in place to ensure that only authorized ‘processes’ access queues? And that they ONLY access their OWN queues
  2. what controls will be put in place to prevent unauthorized message disclosure / deletion / duplication or relay

These are really RabbitMQ questions but I'll answer them quickly.

What protections will be put in place to ensure that only authorized ‘processes’ access queues? And that they ONLY access their OWN queues
what controls will be put in place to prevent unauthorized message disclosure / deletion / duplication or relay

All AMQP clients need to authenticate. RabbitMQ supports fairly fine-grained access control over what you're allowed to set up queues against (i.e. what kind of information you can access) and which queues you're allowed to consume from. You could e.g. have a policy where anyone can set up a queue to subscribe to any messages but you can only consume from your own queues.

removed

I try to answer the concrete questions people have but I don't do their homework.

1 Like