Retry Logstash connection to a DB after failed attempt

Hello everyone,

We have set up a Logstash connector to gather data from a DB every day at a fixed time (7 a.m.). Now, we have included the following parameters in the input.conf file (some parameters are masked for obvious security reasons).

What we are trying to achieve is the following: the connector should try and contact the DB at 7 a.m as specified in the schedule parameter). If the connection fails, it should then try 5 more connection attempts (as specified in the connection_retry_attempts parameter). Between each attempt, there should be a waiting time of 300 seconds (ad specified in the connection_retry_attempts_wait_time parameter). If at any point the connection is accepted and the data is ingested, the connector should not attempt to connect to the DB anymore. The whole process should than start again the following day at 7 a.m.

What we are getting instead is the following: the connector tries indeed to connect to the DB at 7.am. However, it will just try to do one single connection attempt, regardless if the connection is established or not. So it doesn't seem to be taking the connection_retry_attempts and connection_retry_attempts_wait_time parameters into account. In fact, I'm pretty sure that we could comment out those parameters and the behavior of the connector would not change anyway.

Now, we thought in the beginning that the connection_retry_attempts_wait_time was too short (it was set to 30 seconds, which is actually less time than the query takes to run). However, after changing it to 300 seconds (as shown in the picture above) nothing seems to have changed. I have found a couple of discussions about this on the web and even a GitHub issue has been opened in the past (Retry Establishing Connection if it Fails · Issue #91 · logstash-plugins/logstash-input-jdbc · GitHub). However, no one seems to have found a solution for this (or I have not been able to find it).

So, is it possible to achieve the behavior we want by changing the connector's configuration? And if it is, how should we modify our input.conf file so that it works as expected?

Thanks for your help,

Davide

If you look at the code, you can see it always logs an error if it catches one of the two exceptions it is designed to catch. If it does not log one of those messages then a different exception may have been thrown, and that would get caught further up the stack, not triggering the retry.

Every time the schedule of the input runs it calls execute_query, which calls execute_statement, which is the function in the mixin which opens the connection and has the retry logic.

Thanks for your reply.

So, if I'm understanding this correctly, it seems like the exception thrown in our specific case is not handled in the function which has the retry logic in it, thus resulting in the connection not being attempted again.

This is the exception being thrown:
Exception when executing JDBC query {:exception=Java::JavaSql::SQLException: ORA-12801: error signaled in parallel query server P008

If we find a fix to this problem I will post it here, at this point there is probably nothing that can be done through configuration, we may have to create a script which stops and starts the service automatically.

Thanks a lot for your help!

OK, so ORA-12801 tells you than an error occurred in a parallel query. What error occurred? I would expect the next line in the log to be another Oracle error code, which is the actual problem you need to fix. If not, this has some hints on how to get the actual error logged.

Hi,

sorry for not replying in a while but in the end the database will be migrated soon. This issue we are experiencing should not occur with the new infrastructure, so we have decided to wait and see for now.

If we will experience the same issue, we will probably solve the situation with a script which basically does the following: starts the system service for Logstash every day at a fixed hour, then reads the logs and if an error occurs it restarts the service, otherwise it lets the service run until the data has been ingested. It is basically a simple fix that replicates the retry function externally.

In the end the issue is related with the specific error thrown by our DB.

Thanks again for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.