I am using logstash with RabbitMQ as input, elasticsearch as output, and user_agent/geoip filter.
Configuration is pretty simple:
input
{
rabbitmq
{
host => "queue"
exchange => "bench"
queue => "events"
durable => true
auto_delete => false
threads => 1
}
}
filter
{
useragent
{
source => "ua"
target => "ua"
lru_cache_size => 5000
}
geoip
{
source => "ip"
target => "geo"
fields => ["country_name", "continent_code", "city_name", "location", "timezone", "real_region_name"]
}
}
output
{
elasticsearch
{
hosts => ["elasticsearch:9200"]
index => "events-bench1"
document_id => "%{id}"
document_type => "events"
manage_template => false
flush_size => 2500
workers => 4
}
}
Without filters, logstash fetchs 2000 messages by second from RabbitMQ, pretty fast. But since I added filters, logstash consumes no more than 100 messages by second !! And I can see CPU at 100%.
I try to play with "pipeline-batch-size", "wokers" settings, but without success, what could be the best approach to scale logstash (horizontally is an option) so it can consume at least 500msgs/sec without burning my servers?
Here the RabbitMQ messages deliver rate: