Hi , is there anyone can help answer this question, it takes 2 hours even longer for the server to load the one day's apache access logs from the one file to by logstash conf, is it kind of normally ? Any suggestions
Thanks!!!
Hi , is there anyone can help answer this question, it takes 2 hours even longer for the server to load the one day's apache access logs from the one file to by logstash conf, is it kind of normally ? Any suggestions
Thanks!!!
What does your Logstash config look like? How much data is loaded? What is the specification of your Elasticsearch cluster? Which version of Logstash and Elasticsearch are you using?
Thanks for the answers.
First of all, all versions are up to date. Logstash 2.1.0, Elasticsearch 2.1.
The data size is around 1-3 GB per day. (Apache raw logs)
For the cluster I think i have no idea about this.
Most importantly the logstash config looks like:
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
target => "geoip"
database => "D:/dev/elastic/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
}
output {
elasticsearch { hosts => ["127.0.0.1:9200"] }
stdout { codec => rubydebug }
}
}
output {
redis { host => "127.0.0.1"
port => 6379
data_type => "list"
key => "logstash-%{+yyyy.MM.dd}"
}
stdout { codec => rubydebug }
}
Are the CPUs saturated? If not, increase the number of filter workers (the -w
startup option). You'll most likely also want to increase threads
and/or batch_count
for the redis input. Maybe a handful of threads that each fetch a few hundreds messages at a time?
Thanks for the great inputs, is there a way you show me a more detailed example, I am confused about the filter workers especially for the -w startup option as well as threads batch_count, is that filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
geoip {
source => "clientip"
target => "geoip"
database => "D:/dev/elastic/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
Thanks again!!!
batch_count
and threads
are options to the redis input. See the documentation. If you're running Logstash as a daemon you can usually alter the startup options via /etc/default/logstash or /etc/sysconfig/logstash, or perhaps directly in the init script. It depends on how you run Logstash. If you're starting if from a shell you can, obviously, just add that option. But I think batch_count
is going to have the biggest effect on the performance.
As you are running Logstash 2.1 you may not need to worry about the filter workers. While this was 1 by default before version 2.0, it now adjusts depending on the number of cores on the host. Start with adjusting the reds input parameters as Magnus suggested and see what improvement that gives.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.