First, that isn't a typo. In fact, I want to be able to scale LS to handle >500,000 (maybe a million) message per minute once we get out of our POC.
Our basic setup is as follows:
On-prem LS server + Redis > x1 AWS LS > x3 AWS ES servers < x1 AWS Kibana server
On-prem LS server is a 4 x 2.9GHz/8GB box
AWS LS is a 4 x 2.6GHz/16GB box
AWS ES are 4 x 2.5GHz/16GB boxes with 100GB SSD data drives
AWS Kibana is a 2 x 2.5GHz/8GB box
We are shipping Cisco ASA syslogs that are aggregated by our firewall team to the on-prem LS server via a UDP input and shipping them via lumberjack to localhost where we forward them to the redis queue.
**UDP in/Lumberjack out **
input {
udp {
port => 6001
type => "syslog-cisco-asa"
tags => [ "cisco-asa" ]
}
} #end input block
output{
Output to send input data to logstash-to-redis.conf input
lumberjack {
hosts => "localhost"
port => "5173"
ssl_certificate => "/etc/ssl/certs/logstash-forwarder.crt"
codec => json # added because of https://github.com/elastic/logstash-forwarder/issues/381
}
}
#end output block
** Lumberjack in/Redis out **
input {
lumberjack {
port => 5173
#type => "logs" #removed to see if type from source is passed - DLH 10-27-15
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/pki/tls/certs/logstash-forwarder.key"
codec => json
}
}
output {
redis {
host => "localhost"
data_type => "list"
key => "logstash"
congestion_interval => 1
congestion_threshold => 20000000
workers => 16
# Batch processing requires redis >= 2.4.0
batch => true
batch_events => 500
batch_timeout => 5
}
}
What we've noticed is that with the ASA log input enabled that the redis queue quickly started to fill up. We tweaked out AWS LS input as follows:
input {
redis {
host => "AWSLS_servername"
port => 6379
threads => 800
key => "logstash"
data_type => "list"
codec => json
}
}
However, as soon as we added the filtering/groking for ASA formating we noticed that we could no longer keep up. Logstash just couldn't pull records from the Redis queue fast enough. (Note: The filtering and groking are not our own invention, this is the result of some Googlefu and judicious copy/paste)
** Moved to the first reply because I exceeded my 5000 characters per message limit
We've tweaked AWS LS to run with 2 workers for the output as shown in the output below:
output {
elasticsearch {
host => ["AWSES01","AWSES02","AWSES03"]
cluster => itoaelk
workers => 2
}
}
Is there any way to increase the performance of the filter/groking?
What about scaling the AWS LS environment? If we stand up a couple of extra servers and point them as the redis queue, is that sufficient (proper?!?) way to expand the environment?
Once we enable both ASA aggregation points to send to our ELK environment, we'll be handling about 100GB per day of just ASA messages. We'll then need to build out capacity for our application and event logs.
Lots of data + inexperienced ELK admin = lots of crazy questions!
Thanks - Josh