Why does Logstash not restart from JDBC I/O errors?

arisbanach · January 7, 2018, 7:43pm

For some reason lately I've been experiencing JDBC input I/O errors in Logstash's logs, but Logstash evidently doesn't do anything when it encounters these. It just hangs and the pipeline can't continue anymore. I have to manually check if it's still processing and when I see Elasticsearch isn't indexing anymore I go back to Logstash, see the I/O error log, restart it and it works for another few million docs or so.

Couldn't Logstash see these errors and restart the connection automatically when it experiences them?

guyboertje · January 8, 2018, 11:03am

What version of the JDBC input are you using? bin/logstash-plugin list --verbose | grep jdbc or similar.

This version 4.3.0 was refactored to open and close connections for each trip to the db (each scheduled run)

arisbanach · January 8, 2018, 2:16pm

logstash-filter-jdbc_streaming (1.0.3)
logstash-input-jdbc (4.3.2)

I could be wrong but I feel like this I/O error doesn't break the connection or something. I think I remember seeing it was still connected to the database I'm pulling from, but the logs were just stopped on that I/O error and nothing else was happening. I had to stop Logstash to kill the connection.

guyboertje · January 8, 2018, 3:11pm

Is there any other backtrace info in the logs?
Is there any indication in the Postgres logs that it restarted a little bit before the log message.
Is there a network proxy or load balancer between LS and the DB server?

Some prior art:
Caused by: java.io.EOFException
https://www.postgresql.org/message-id/001701c53a90$cf879cc0$6600a8c0@kermit
https://www.postgresql.org/message-id/42540D4D.2000504%40hogranch.com
Caused by: java.net.SocketException: Socket closed

https://confluence.atlassian.com/confkb/getting-a-psqlexception-error-an-io-error-occured-while-sending-to-the-backend-due-to-corrupt-postgres-database-162988249.html -> advice: check that the postgres db is not crashing (hw failure)

You could try to set jdbc_properties via the sequel_opts setting...

  sequel_opts => {
    "jdbc_properties" => {
      "socketTimeout" => 20 #seconds
      "tcpKeepAlive" => true
    }
  }

See https://jdbc.postgresql.org/documentation/94/connect.html

arisbanach · January 8, 2018, 6:51pm

Is there any other backtrace info in the logs?

I've included our Postgres logs below.

Is there any indication in the Postgres logs that it restarted a little bit before the log message.

I don't think so, but I'm not sure. The logs I've included are where the errors start.

Is there a network proxy or load balancer between LS and the DB server?

No

2018-01-08 16:17:30 UTC:removed[8341]:LOG: server process (PID 21535) was terminated by signal 11: Segmentation fault

2018-01-08 16:17:30 UTC:removed[8341]:LOG: terminating any other active server processes

2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278] terminating connection because of crash of another server process

2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278]:HINT: In a moment you should be able to reconnect to the database and repeat your command.

repeat above logs a bunch of times

2018-01-08 16:17:30 UTC:removed[8341]:LOG: archiver process (PID 26438) exited with exit code 1

2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450] terminating connection because of crash of another server process

2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:HINT: In a moment you should be able to reconnect to the database and repeat your command.

then repeat below log a bunch of times

2018-01-08 16:17:30 UTC:removed(53166):removed@removed:[21537]:FATAL: the database system is in recovery mode

Do you think that this log:

2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

could be because I often run sudo systemctl stop logstash and maybe the Postgres database couldn't stop the large query very quickly so the processed was killed in a bad way (or something)?

guyboertje · January 9, 2018, 9:33am

Well it does look like your Postgres db is segfaulting.

Sequel uses a connection pool under the hood so I guess the jdbc input's connection is a pool connection lease. I think it is these connections that are throwing the PGException indirectly.

arisbanach · January 9, 2018, 12:56pm

I'm not familiar with the concept of segfaulting. Is that something caused by a bug in Postrgesql, or AWS RDS, or is it because of some misconfiguration on our part? Or it normal to see this if you overload the database somehow?

guyboertje · January 9, 2018, 1:04pm

An abnormal obscure bug of some kind in Postgres I think. How often does it happen?

I am not a DBA type expert.

Try to upgrade to a more recent version of PG and the driver too.

Other than that, you will need to put your Sherlock Holmes hat on and search the internet.

arisbanach · January 9, 2018, 1:57pm

okay thank you

system · February 6, 2018, 1:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC can't connect anymore for no apparent reason Logstash	2	966	January 8, 2018
Logstash JDBC input plugin: Java::OrgPostgresqlUtil::PSQLException: An I/O error occurred while sending to the backend Logstash	1	336	January 20, 2023
Systemd restarts logstash when using jdbc input Logstash	6	2021	December 30, 2017
Handle SQL JDBC input errors Logstash	4	550	June 19, 2018
SQL connection exception continuing in Logstash Logstash	1	929	March 10, 2017

Why does Logstash not restart from JDBC I/O errors?

Related topics