LogStash Stops after a few hours


#1

All,

I have a few devices sending logs to ELK. All are sending based on syslog going to specific ports and being indexed based on the port it goes to. All of it so far is just networking gear such as juniper and cisco. Everything seems to work great but after a few hours something stops. In kibana it shows no more events. I restart logstash and it all starts to work again. There is nothing in /var/log/logstash/logstash.log or .err.

Any ideas where it even start looking? The CPU is at about 5% peak and not all the memory is used with only about 1% of the drive used.

thanks


(Mark Walkom) #2

What version? What JVM?
What does your config look like?


#3

java version "1.8.0_45"

Java(TM) SE Runtime Environment (build 1.8.0_45-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode

logstash-core-1.5.0.rc4

basic config.

3 files do filtering and inputs based on port number

1 file that handles the output. Here is a sample of the input file and the output file:

input {

udp {
port => 5143
type => vtc
}

}

filter {

#############################################################################Auth Failures##################################

if [message] =~ "Authentication failure" {

 mutate {
    add_field => [ "vtc-type", "Auth_Failure" ]
    add_field => [ "log-type", "vtc" ]

  }

}

else if [message] =~ "User cannot be authenticated" {

 mutate {
    add_field => [ "vtc-type", "Auth_Failure" ]
    add_field => [ "log-type", "vtc" ]
  }

}

else if [message] =~ "Login attempt" and [message] =~ "FAILURE " {

 mutate {
    add_field => [ "vtc-type", "Auth_Failure" ]
    add_field => [ "log-type", "vtc" ]
  }

}

else if [message] =~ "Unauthenticated user" and [message] =~ "FAILURE " {

 mutate {
    add_field => [ "vtc-type", "Auth_Failure" ]
    add_field => [ "log-type", "vtc" ]
  }

}

################################################################################Auth Success#################################

else if [message] =~ "Starting session: shell" {

 mutate {
    add_field => [ "vtc-type", "Login_Success" ]
    add_field => [ "log-type", "vtc" ]
  }

}

else if [message] =~ "Recorded successful login" {

 mutate {
    add_field => [ "vtc-type", "Login_Success" ]
    add_field => [ "log-type", "vtc" ]
  }

}

else if [message] =~ "user ID" and [message] =~ "SUCCESS" {

 mutate {
    add_field => [ "vtc-type", "Auth_Success" ]
    add_field => [ "log-type", "vtc" ]
  }

}

}

OUTPUT##############################################################

output {
stdout { codec => rubydebug }

if [log-type] =="vtc" {
elasticsearch {
index => "logstash_vtc-%{+YYYY.MM.dd}"
host => "localhost"
}
}

else if [log-type] =="cisco" {
elasticsearch {
index => "logstash_cisco-%{+YYYY.MM.dd}"
host => "localhost"
}
}

else if [log-type] =="juniper" {
elasticsearch {
index => "logstash_juniper-%{+YYYY.MM.dd}"
host => "localhost"
}
}

else {

elasticsearch {
index => "logstash_unknown-%{+YYYY.MM.dd}"
host => "localhost"
}
}

}


(Mark Walkom) #4

There's since been a GA release of 1.5, I'd suggest that you upgrade to that first.


#5

thanks....because i am fairly new to this, is there a "best way" to upgrade?

Also not sure if it is related but i am getting these from a cisco device:

"Received an event that has a different character encoding than you configured"


(Mark Walkom) #6

How did you install, deb/rpm or zip?


#7

because i am half way lazy and wanted to make this repeatable, i created a script to install ELK. the logstash part is here:

echo 'deb http://packages.elasticsearch.org/logstash/1.5/debian stable main' | sudo tee /etc/apt/sources.list.d/logstash.list

sudo apt-get update
sudo apt-get install -y logstash


(Mark Walkom) #8

sudo apt-get install logstash should do it.


#9

ok so i did the install and no issues in getting it upgraded. Still has the same issue. I had one device sending a lot of "Received an event that has a different character encoding than you configured." errors and fixed that. it seemed more stable but crashed again last night. the last log in logstash.log is the same as above, but with no host listed as to who is doing it. other than that no other errors.


#10

I also replaced the file below as it seems related but still no go.....it is only staying up for a few minutes then hangs:

https://raw.githubusercontent.com/driskell/logstash/fe1cb7c91f13fda1661b9471dbcc4bd390b4d487/lib/logstash/pipeline.rb


#11

Ok still having the same issue, here is what was in the logs. what is odd is it seems to be throwing an error but only under certain circumstances after some time:

{:timestamp=>"2015-06-11T07:57:02.735000-0800", :message=>"Error: Expected one of #, input, filter, output at line 324, column 1 (byte 4415) after "}
{:timestamp=>"2015-06-11T07:57:02.742000-0800", :message=>"You may be interested in the '--configtest' flag which you can\nuse to validate logstash's configuration before you choose\nto restart a running system."}


(Magnus B├Ąck) #12

Is something restarting your Logstash daemon? I don't think the configuration reader code path is reached except during initialization.


(Suyog Rao) #13

Are there any non LS config files in your /etc/logstash/conf.d dir? If so, thats the problem.


#14

All,

Thanks for the feed back.I checked the /etc/logstash/conf.d directory and only few files there, all of which are .conf files. Also it crashed again, but the log files actually show nothing. Also I checked all the cron jobs running to make sure it was not restarting. Here are the last lines in all three logs:

.eer=
Jun 19, 2015 9:05:51 AM org.elasticsearch.cluster.service.InternalClusterService$UpdateTask run
INFO: [logstash-MbsVmLogAgg-1-1442407170-7960] added {[logstash-MbsVmLogAgg-1-1442407170-7968][OVMa1pHhRZWpqsJIkv_WQA][MbsVmLogAgg-1][inet[/10.0.2.40:9310]]{data=false, client=true},}, reason: zen-disco-receive(from master [[Raa of the Caves][2F51UGUFRZedylo470v86Q][MbsVmLogAgg-1][inet[/127.0.0.1:9300]]])

.log=
{:timestamp=>"2015-06-16T08:41:34.806000-0800", :message=>"SIGTERM received. Shutting down the pipeline.", :level=>:warn}

.stdout=
"message" => "<182>1 2015-06-20T15:20:07.652-08:00 xxx.xxx.xxx RPRM 5248 JserverAudit - INFO |http-443-37|ResourceManager_Audit_Log| USER_CHANGE (NOTICE): SUCCESS [+ TR74bd5] ~ Recorded successful login statistics for 1 [user ID: LOCAL\admin, source: xxx.xxx.xxx.xxx, destination: xxx.xxx.xxx.xxx]\n",
"@version" => "1",
"@timestamp" => "2015-06-20T23:20:09.132Z",
"type" => "vtc",
"host" => "xxx.xxx.xxx.xxx",
"vtc-type" => "Login_Success",
"log-type" => "vtc"


#15

Also if i try and shut down the service it hangs and I have to kill the process so that I can restart it. This only happens when it "crashes".


#16

OK so i turned on verbose logging via the -vv option. when it crashed, here is the last line on the .log log:

<14>2015 Jun 24 22: -admin-shell: %WAAS-PARSER-6-350232: CLI_LOG log_cli_command: show flash ", "@version"=>"1", "@timestamp"=>"2015-06-24T22:04:39.372Z", "type"=>"cisco", "host"=>"xxx.xxx.xxx.xxx"}, "log-type"]}>>]]}, :batch_timeout=>1, :force=>true, :final=>nil, :level=>:debug, :file=>"stud/buffer.rb", :line=>"207", :method=>"buffer_flush"}


(Asaf Yigal) #17

We had the same issues with logstash and if you look it up you'll see that you are not alone facing these issues.

I can tell you that we ended up not using logstash at the end due to continuous stability issues that ended up taking too much of our time but here are a few pointers if you still want to use it.

  1. Take a look at memory consumption - this is a big thing in logstash and though you might have a strong machine it sometimes dies from OutOfMemory and nothing if written to the log files in such case - the process just hangs.
  2. Take a look at the max message size and how you are sending the logs to Logstash - if the events are longer then the buffer (Which cannot be extended beyond a certain value) Logstash issues a socket disconnect and most shippers just give up and not try to reconnect until you restart logstash.

Hope that help and if it doesn't I'm sure you can find a stable solution for syslog.

-- Asaf.


#18

Well i upgraded to the latest version and so far so good for over a week. I would be interested in knowing what you went to


(system) #19