Logstash filter user_agent or geoip very slow

I am using logstash with RabbitMQ as input, elasticsearch as output, and user_agent/geoip filter.

Configuration is pretty simple:

input
{
	rabbitmq
	{
        host => "queue"
        exchange => "bench"
        queue => "events"
        durable => true
        auto_delete => false
        threads => 1
    }
}

filter
{
	useragent
    {
        source => "ua"
        target => "ua"
        lru_cache_size => 5000
    }

    geoip
    {
        source => "ip"
        target => "geo"
        fields => ["country_name", "continent_code", "city_name", "location", "timezone", "real_region_name"]
    }
}

output
{
	elasticsearch
    {
        hosts => ["elasticsearch:9200"]
        index => "events-bench1"
        document_id => "%{id}"
        document_type => "events"
        manage_template => false
        flush_size => 2500
        workers => 4
    }
}

Without filters, logstash fetchs 2000 messages by second from RabbitMQ, pretty fast. But since I added filters, logstash consumes no more than 100 messages by second !! And I can see CPU at 100%.

I try to play with "pipeline-batch-size", "wokers" settings, but without success, what could be the best approach to scale logstash (horizontally is an option) so it can consume at least 500msgs/sec without burning my servers?

Here the RabbitMQ messages deliver rate:

RabbitMQ messages deliver rate

From what I understand, doing lookups in the geo database is pretty IO expensive. Your filtering is probably stuck waiting for IO operations on the disk. What sort of hardware are you running it on?

I am running it on my laptop MacOS i7, on Docker containers running on VirtualBox machine with 4 vCPU and 4 Gb RAM.

I believed geoip caches geo-database file! This is a good point, I will put logstash in a tmpfs folder and try again.

Thanks you for the idea.

It's a bit better indeed !

But I am afraid there is a mutex lock, and so, logstash pipeline is useless in this case.

I have better result with a single pipeline worker

Made a PR:

Which solves the mutex issue.