Logstash filter user_agent or geoip very slow

ebuildy · March 2, 2016, 5:24pm

I am using logstash with RabbitMQ as input, elasticsearch as output, and user_agent/geoip filter.

Configuration is pretty simple:

input
{
	rabbitmq
	{
        host => "queue"
        exchange => "bench"
        queue => "events"
        durable => true
        auto_delete => false
        threads => 1
    }
}

filter
{
	useragent
    {
        source => "ua"
        target => "ua"
        lru_cache_size => 5000
    }

    geoip
    {
        source => "ip"
        target => "geo"
        fields => ["country_name", "continent_code", "city_name", "location", "timezone", "real_region_name"]
    }
}

output
{
	elasticsearch
    {
        hosts => ["elasticsearch:9200"]
        index => "events-bench1"
        document_id => "%{id}"
        document_type => "events"
        manage_template => false
        flush_size => 2500
        workers => 4
    }
}

Without filters, logstash fetchs 2000 messages by second from RabbitMQ, pretty fast. But since I added filters, logstash consumes no more than 100 messages by second !! And I can see CPU at 100%.

I try to play with "pipeline-batch-size", "wokers" settings, but without success, what could be the best approach to scale logstash (horizontally is an option) so it can consume at least 500msgs/sec without burning my servers?

Here the RabbitMQ messages deliver rate:

RabbitMQ messages deliver rate

arizonawayfarer · March 2, 2016, 9:43pm

From what I understand, doing lookups in the geo database is pretty IO expensive. Your filtering is probably stuck waiting for IO operations on the disk. What sort of hardware are you running it on?

ebuildy · March 3, 2016, 6:54am

I am running it on my laptop MacOS i7, on Docker containers running on VirtualBox machine with 4 vCPU and 4 Gb RAM.

I believed geoip caches geo-database file! This is a good point, I will put logstash in a tmpfs folder and try again.

Thanks you for the idea.

ebuildy · March 3, 2016, 10:57am

It's a bit better indeed !

But I am afraid there is a mutex lock, and so, logstash pipeline is useless in this case.

I have better result with a single pipeline worker

ebuildy · March 3, 2016, 12:58pm

Made a PR:

Which solves the mutex issue.

Topic		Replies	Views
User_agent and geoip filters does not work Logstash	12	1189	March 7, 2017
Slow log file read from url for geoip Logstash	11	1053	July 6, 2017
Logstash performance crippled between 6.2.4 and 6.3.0(+) Logstash	1	281	February 4, 2019
Useragent filter really slow Logstash	3	2532	July 6, 2017
Install ingest-user-agent on ELK Logstash	3	548	November 26, 2018

Logstash filter user_agent or geoip very slow

Related topics