Performances degradation when using type field in inputs


(Alessio Martorelli) #1

Hi all,
I've been trying to troubleshoot few latency issues that I have with Logstash.
I'm using Redis as input source and for each input file I decided to specify a type. What I've found is that when rotating the index, Elasticsearch tries to update_mapping for all of these defined types.
I still can't understand if this is a coincidence or not but when the update_mapping runs Redis queue starts to grow and sometimes the only way to stop this is restart Elasticsearch or Logstash (or both of them).

I'm running a one node only cluster with 20g of heap size. Would it be any better if I removed these types and I used tags only instead?

Logstash version: logstash-1.4.2-1/ logstash-contrib-1.4.2-1
Elasticsearch version: elasticsearch-1.4.4-1

Many thanks in advance


(Mark Walkom) #2

Updating mappings should be pretty quick and not noticeable. So that seems pretty odd.

What does your config look like?


(Alessio Martorelli) #3

The server has this configuration:

input {
redis{
    host => "hostname"
    data_type => "list"
    key => "logstash"
    port => 6375
    tags => "from_redis"
}
syslog {
    type => "syslog"
    port => 6400
    tags => ["from_rsyslog","network_devices"]
}
}
filter {
    if "network_devices" in [tags] {
        grok {
            match => ["message", "%{INT:number}: %{HOSTNAME:hostname}: %{CISCOTIMESTAMP}: %{GREEDYDATA:loglevel}: %{GREEDYDATA:description}"]
        }
        if [hostname] {
            mutate {
                replace => ["host", "%{hostname}"]
            }
        }
    if "from_redis" in [tags] {
        mutate {
            gsub => ["host", "\..*" , "" ]
        }
    }

    fingerprint {
        source => ["@timestamp","message"]
        method => "SHA1"
        key => "secretkey_for_fingerprint"
        concatenate_sources => true
    }
}
output {
    if "from_redis" in [tags] or "from_rsyslog" in [tags] {
        elasticsearch {
            cluster =>"es_clustername"
            flush_size => 5
            document_id => "%{fingerprint}" # !!! prevent duplication
            protocol => "http"
        }
    }
    statsd {
        host => "host"
        namespace => "namespace"
            sender => "logstash"
    }
}

The clients have a simple configuration that reads from files applies grok filters and sends to redis.

Thank you warkolm, and let me know if I can give you any missing detail.


(Mark Walkom) #4

How many types do you have?


(Alessio Martorelli) #5

20 types more or less.


(Mark Walkom) #6

Ahh, well then that might be a problem as you're likely to have a lot of updates to mappings and that many mappings can definitely slow things down.

How big is your node? How much data? How many indices?
Have you tried using templates?


(Alessio Martorelli) #7

Well I'm running with a one node only configuration and I'm getting quite a lot of data (50M a day). I know it's probably worth scaling up, but I just wanted to make sure is not something else first.
The server has 64G of ram and 24 cores. I tend to keep 21 indexes, each of them is 16 to 20G in size.

I've never used templates, I can have a look at them.


(Mark Walkom) #8

Are you using the standard 5 shards that LS creates? If so then it's likely that is the problem, not the mapping!

Update the LS template to create only one or two shards and see how it goes.


(Alessio Martorelli) #9

Ok, as mentioned I remembered I tried decreasing the number of shards but didn't help. Anyway I'll give it another try. Also I think my only alternative would be to scale up because the number of shards is also important to query Elasticsearch with Kibana.

Thank you warkolm!


(system) #10