Logstash fails to start with SQS Input

Hey everyone. New to the forum; hope I'm creating this thread in the right place. Been battling this odd problem for a few days and could really use some help.

We have a Logstash server on an AWS EC2 server currently taking input from JMS and outputting to OpenSearch. Attempting to change from JMS to using an SQS input.

The folks over at AWS support and I went through and verified that everything on the AWS side of things is, in fact, configured correctly (for SQS, I mean). I've also been through every bit of documentation I can find that describe the versions of Logstash and the SQS input plugin we have installed, and I haven't found anything obviously amiss in our config.

The symptom is, when I start the Logstash service and tail the Logstash log, the "Starting Logstash" log message appears, and then around a minute or two later, repeats; and then a minute or two after that, repeats again...and again...and again...and again. This is all that appears in the log; there are no error messages, or any other messages at all aside from "Starting Logstash".

Watching top during this reveals an ever-rising LA, and if I let it go on long enough, the server eventually becomes entirely non-responsive (requiring a hard-reboot).

The server is an EC2 instance running Amazon Linux 2, running Logstash version 6.8.23 with logstash-input-sqs plugin version 3.1.3. Logstash was installed using Amazon's yum repo, and the sqs input plugin was installed using Logstash's logstash-plugin utility.

Note: The server does startup and runs perfectly normally (and functions as intended) when using the JMS input and OpenSearch output. The symptom appears when I remove the JMS input from and add the SQS input to the config file.

In my searching, I haven't found anyone who has reported a similar issue. Has anyone run into this before or have any idea of what could be going on? I've been on this issue for days now and I'm completely out of ideas.

Thank you for reading!

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

There are a very few error messages that get written to stderr, but not to log4j because at the point where they are generated (while reading the configuration) log4j cannot have been initialized. Make sure your service manager captures stdout/stderr and review the messages.

@Badger Thank you for the suggestion! That's a good point about log4j. I used journalctl to take a peak at what's in there and while there's more, it's not particularly helpful...

The following is what was revealed by journalctl...I have pasted in two repetitions of the pattern; this pattern repeats when attempting to run Logstash with the SQS input defined in the conf file.

Nov 01 18:34:43 ip-10-0-4-244.ec2.internal systemd[1]: Started logstash.
Nov 01 18:35:16 ip-10-0-4-244.ec2.internal logstash[26318]: Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties
Nov 01 18:35:18 ip-10-0-4-244.ec2.internal logstash[26318]: [2022-11-01T18:35:18,406][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.23"}
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service: main process exited, code=killed, status=9/KILL
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: Unit logstash.service entered failed state.
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service failed.
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service holdoff time over, scheduling restart.
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: Stopped logstash.
Nov 01 18:35:53 ip-10-0-4-244.ec2.internal systemd[1]: Started logstash.
Nov 01 18:36:26 ip-10-0-4-244.ec2.internal logstash[27081]: Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties
Nov 01 18:36:27 ip-10-0-4-244.ec2.internal logstash[27081]: [2022-11-01T18:36:27,967][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.23"}
Nov 01 18:37:00 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service: main process exited, code=killed, status=9/KILL
Nov 01 18:37:00 ip-10-0-4-244.ec2.internal systemd[1]: Unit logstash.service entered failed state.
Nov 01 18:37:00 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service failed.
Nov 01 18:37:01 ip-10-0-4-244.ec2.internal systemd[1]: logstash.service holdoff time over, scheduling restart.
Nov 01 18:37:01 ip-10-0-4-244.ec2.internal systemd[1]: Stopped logstash.

Run into this before?

Thanks!
Garrett

No, I have not. When I Google "logstash.service: main process exited, code=killed, status=9/KILL" I gets hits for elasticsearch failing to start. The few that have solutions say the initial JVM heap size is greater than the amount of memory allocated to the container, so the JVM fails.

You need to look in the system log.

I think Amazon Linux is Red Hat based, so you need to check /var/log/messages, try to start Logstash to generate new errors and check the log file.

Hey @Badger. I found the same thing a bit ago, and as I was staring down the logs attempting to get them to confess (joke), I decided, just for kicks, I'll change it to a bigger instance and try it...just of the off chance it happens to be something that stupid.

...

and sure enough...now it works. Evidently, the SQS input plugin is far hungrier for RAM than the JMS input plugin is. I hadn't even thought about it because when running it with the JVM input, the (now previously) tiny instance would run with a LA <0.1.

So yeah, the solution was: more RAM. :man_facepalming:

Thank you for your help, it is much appreciated, and thank you @leandrojmp - your response posted as I'm writing this. Much appreciated. Have a great weekend, y'all.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.