Performances degradation when using type field in inputs

Alessio_Martorelli · August 10, 2015, 10:25am

Hi all,
I've been trying to troubleshoot few latency issues that I have with Logstash.
I'm using Redis as input source and for each input file I decided to specify a type. What I've found is that when rotating the index, Elasticsearch tries to update_mapping for all of these defined types.
I still can't understand if this is a coincidence or not but when the update_mapping runs Redis queue starts to grow and sometimes the only way to stop this is restart Elasticsearch or Logstash (or both of them).

I'm running a one node only cluster with 20g of heap size. Would it be any better if I removed these types and I used tags only instead?

Logstash version: logstash-1.4.2-1/ logstash-contrib-1.4.2-1
Elasticsearch version: elasticsearch-1.4.4-1

Many thanks in advance

warkolm · August 11, 2015, 2:41am

Updating mappings should be pretty quick and not noticeable. So that seems pretty odd.

What does your config look like?

Alessio_Martorelli · August 11, 2015, 7:36am

The server has this configuration:

input {
redis{
    host => "hostname"
    data_type => "list"
    key => "logstash"
    port => 6375
    tags => "from_redis"
}
syslog {
    type => "syslog"
    port => 6400
    tags => ["from_rsyslog","network_devices"]
}
}
filter {
    if "network_devices" in [tags] {
        grok {
            match => ["message", "%{INT:number}: %{HOSTNAME:hostname}: %{CISCOTIMESTAMP}: %{GREEDYDATA:loglevel}: %{GREEDYDATA:description}"]
        }
        if [hostname] {
            mutate {
                replace => ["host", "%{hostname}"]
            }
        }
    if "from_redis" in [tags] {
        mutate {
            gsub => ["host", "\..*" , "" ]
        }
    }

    fingerprint {
        source => ["@timestamp","message"]
        method => "SHA1"
        key => "secretkey_for_fingerprint"
        concatenate_sources => true
    }
}
output {
    if "from_redis" in [tags] or "from_rsyslog" in [tags] {
        elasticsearch {
            cluster =>"es_clustername"
            flush_size => 5
            document_id => "%{fingerprint}" # !!! prevent duplication
            protocol => "http"
        }
    }
    statsd {
        host => "host"
        namespace => "namespace"
            sender => "logstash"
    }
}

The clients have a simple configuration that reads from files applies grok filters and sends to redis.

Thank you warkolm, and let me know if I can give you any missing detail.

warkolm · August 11, 2015, 8:05am

How many types do you have?

Alessio_Martorelli · August 11, 2015, 8:41am

20 types more or less.

warkolm · August 11, 2015, 10:20am

Ahh, well then that might be a problem as you're likely to have a lot of updates to mappings and that many mappings can definitely slow things down.

How big is your node? How much data? How many indices?
Have you tried using templates?

Alessio_Martorelli · August 11, 2015, 10:58am

Well I'm running with a one node only configuration and I'm getting quite a lot of data (50M a day). I know it's probably worth scaling up, but I just wanted to make sure is not something else first.
The server has 64G of ram and 24 cores. I tend to keep 21 indexes, each of them is 16 to 20G in size.

I've never used templates, I can have a look at them.

warkolm · August 11, 2015, 11:11am

Are you using the standard 5 shards that LS creates? If so then it's likely that is the problem, not the mapping!

Update the LS template to create only one or two shards and see how it goes.

Alessio_Martorelli · August 12, 2015, 1:36pm

Ok, as mentioned I remembered I tried decreasing the number of shards but didn't help. Anyway I'll give it another try. Also I think my only alternative would be to scale up because the number of shards is also important to query Elasticsearch with Kibana.

Thank you warkolm!