Hello,
I have two ubuntu instances in Amazon with the following configuration.
Memory - 62.9
Hard Disk - 197G
Hard Disk Available - 184G
ard Disk Type: SSD
CPU - 8 Core
The first instance has logstash setup. It reads Apache logs. There are 10 such logs - each with 140000000 (14 crore lines).
When logstash is started, the CPU spikes to more than 150%
If i start two instances , one instance shows 150% and the other more than 200%.
Have tried with 10 such instances (all toggling between 200-250%)
We have written a simple grok filter for reading the logs and then pushing them to Elastic search (set in the other instance)
Also, graphite displays around 3500 events processed per min. Is that appropriate?
Let me now hw the performance can be improved. Thanks for your help in advance.
Refer the logstash configuration file,
input {
file {
path => "/home/ubuntu/logs/*.txt"
start_position=>"beginning"
type => "tomcat_accessLogs"
sincedb_path => "/dev/null"
}
}
filter {
if [type] =="tomcat_accessLogs"{
metrics {
meter => "events"
add_tag => "metric"
}
grok {
match => {"message" =>"%{IP:ipaddress}%{SPACE}[%{HTTPDATE:datetime}]%{SPACE}%{WORD:method}%{SPACE}%{DATA:request}\s++%{NOTSPACE:protocol}%{SPACE}%{INT:statuscode}%{SPACE}%{INT:size}%{SPACE}(?%{NOTSPACE:queryString}|\s)%{SPACE}%{USER:user_name}%{GREEDYDATA:user-agent}"}
}
date {
locale => "en"
match => [ "datetime", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate{
remove_field => ["message","datetime"]
}
}
}
output {
elasticsearch
{
host => "x.x.x.x"
protocol => "http"
port => "9200"
cluster => "elkcluster"
}
graphite {
metrics => [ "events.rate_1m", "%{events.rate_1m}" ]
}
}
Regards,
Bhavani