Safely shutting down logstash on AWS auto-scaling group


(Jeff Elliott) #1

I thought I'd share some work I've done on getting logstash to work better with AWS auto-scaling groups. This should generalize to anyone who has their logstash server(s) automatically stop / reboot for any reason.

This note is partly to put people on the right trail, and partly to request commentary from those who have the knowledge to point out mistakes.

Summary: The two issues I had were:
1 - logstash init.d script times out on "stop"
2 - logstash does not gracefully shut down for reboot of instance

Thing 1:
After setting up logstash on a AWS EC2 instance, I'm able to use

sudo service restart logstash

(and start, stop, etc.) to happily control the service.

However, on the (t2.micro) instance (for non-AWS folks, that's a really small instance - about 10% of one core and 1GB of RAM) I'm using, stop / restart don't behave well. In particular, the 5-second shutdown clock in the /etc/init.d/logstash script always times out before logstash finishes shutting down.

I simply changed the

sleep 1

line in the stop() function to

sleep 5

To give it 25 seconds instead of 5. It seems to take about 8. This might be a problem only for super small servers, but I'm processing millions of records, and I want to make sure the queue has time to drain.

Thing 2:
The next problem I noticed when I started wondering what happens when the group auto-scales. Again for the non-AWS folk, you can pretty simply set up a few rules to expand or shrink a group of logstash servers based on demand. Mine are set to scale when my input queue gets too big, indicating backlog, and when it gets under control again.

But this involves servers being shut down, and I wondered if logstash was being allowed to shut down cleanly.

Short answer: nope!

After instrumenting the script, I concluded that it wasn't being called at all on reboot / shutdown. A bit of research later, and I found out that when using init.d, things that want to be called on shutdown need a lock file in

/var/lock/subsys

So I added lines on start() to

touch /var/lock/subsys/$name

and on stop() to

rm -f /var/lock/subsys/$name

And suddenly logstash was being stopped nicely!

Conclusion:

So I was wondering if everyone has migrated away from init.d to upstart or something else, or perhaps if the init.d that comes with logstash could use some improvement? Or perhaps I've blundered terribly?

Cue folks who know more than me about init.d!

Thanks,
Jeff


(system) #2