Redis as a broken really help with loading the raw logs?


#1

Hi , is there anyone can help answer this question, it takes 2 hours even longer for the server to load the one day's apache access logs from the one file to by logstash conf, is it kind of normally ? Any suggestions

Thanks!!!


(Christian Dahlqvist) #2

What does your Logstash config look like? How much data is loaded? What is the specification of your Elasticsearch cluster? Which version of Logstash and Elasticsearch are you using?


#3

Thanks for the answers.

First of all, all versions are up to date. Logstash 2.1.0, Elasticsearch 2.1.

The data size is around 1-3 GB per day. (Apache raw logs)

For the cluster I think i have no idea about this.

Most importantly the logstash config looks like:

  1. logstash_indexer.conf
    input {
    redis {
    host => "127.0.0.1"
    port => 6379
    type => "redis-input"
    data_type => "list"
    key => "logstash-2015.12.09"
    codec => json
    #threads => 5
    }

}

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
target => "geoip"
database => "D:/dev/elastic/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}

}

output {
elasticsearch { hosts => ["127.0.0.1:9200"] }
stdout { codec => rubydebug }
}

  1. logstash_shipper.conf
    input {
    file {
    path => "D:/dev/elastic/logs//"
    start_position => beginning
    #sincedb_path => "/dev/null"
    }

}

output {
redis { host => "127.0.0.1"
port => 6379
data_type => "list"
key => "logstash-%{+yyyy.MM.dd}"

   }

stdout { codec => rubydebug }
}


(Magnus Bäck) #4

Are the CPUs saturated? If not, increase the number of filter workers (the -w startup option). You'll most likely also want to increase threads and/or batch_count for the redis input. Maybe a handful of threads that each fetch a few hundreds messages at a time?


#5

Thanks for the great inputs, is there a way you show me a more detailed example, I am confused about the filter workers especially for the -w startup option as well as threads batch_count, is that filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
target => "geoip"
database => "D:/dev/elastic/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
Thanks again!!!


(Magnus Bäck) #6

batch_count and threads are options to the redis input. See the documentation. If you're running Logstash as a daemon you can usually alter the startup options via /etc/default/logstash or /etc/sysconfig/logstash, or perhaps directly in the init script. It depends on how you run Logstash. If you're starting if from a shell you can, obviously, just add that option. But I think batch_count is going to have the biggest effect on the performance.


(Christian Dahlqvist) #7

As you are running Logstash 2.1 you may not need to worry about the filter workers. While this was 1 by default before version 2.0, it now adjusts depending on the number of cores on the host. Start with adjusting the reds input parameters as Magnus suggested and see what improvement that gives.


(system) #8