[MSSQL module] - Transaction log stop after few hours

I was able to configure the mssql module on metricbeat. I use the ssl config for each Elasticsearch and Kibana. I was able to load the default dashboards in kibana. The thing is that I can see transaction log data for a short period of time :


(each time data starts to be printed is when I manualy restarted metricbeat)
But I can see data for the Performance metric for the same period :
image
Plus, there is no error in journalctl and /var/log/metricbeat/metricbeat

Is anybody knows why Transaction log data stop to be send to Elasticsearch and how can I fix it ?

Hi @Delta32000 :slightly_smiling_face:

Can you check in Discover that you actually have data on those periods? What we want here is to know if it's an error in the dashboard visualization or the module is faulty for whatever reason.

Also check that your sys.dm_db_log_stats, sys.dm_db_log_space_usage and sys.databases dbs in sql server is being updated. I mean, if you are not having transactions because you are mostly executing single statements and "selects", it might be the expected output.

Hey @Mario_Castro

Thanks for your reply !
For the same periode in Discover (with a filter mssql.transaction_log.space_usage.total.bytes:exists on metricbeat-* index pattern) I can see that :


And with a filter on mssql.performance.logins_per_sec :

I have checked sys.dm_db_log_stats manually with any database_id during a similar "STOP" period, and I can have data.

I try to monitor a huge database with a lot of transaction per minute and with about a transaction_log backup each 10 minutes. I tried to find any error on winlog / sql server / server itself and elastic host server but I can't relate anything with those blank space period...

When I check with

journalctl -xe --unit "metricbeat" | grep transaction_log

I have 0 row unless I restart metricbeat (then I have 1 more row each 10 sec as expected)

sudo systemctl restart metricbeat.service

I am kinda new in the elastic stack world so maybe I missed something :thinking:

[UPDATE]

I tried running metricbeat on an other sqlserver instance with only mssql module enabled. I connected this metricbeat to a brand new elastic stack VM with only basics (not even the minimal security was set).

And with this configuration I'm facing the same issue... after a random moment transaction_log monitoring just stops when the performance one continue his life.

I'll try to connect two instances of metricbeat to the same mssql instance (and two Elasticsearch) just to see if both stop at the same moment.

Maybe it's a version mismatch. The Stack cannot be automatically tested to detect regressions to newer MSSQL versions. Which version are you using?

That table uses the mssql.transaction_log.space_usage.total.bytes Elasticsearch field which is populated one by one by running USE db; SELECT * FROM sys.dm_db_log_space_usage; and getting the value of total_log_size_in_bytes which is the second column in a 2017 version.

I'm thinking if the problem can be that it executed a USE db just before and somehow there's a race condition.

Does that rings any bell?

Actually we are running 3 sql server instances.
Each one are in 12.0.6024.0, 12.0.6372.1 and 12.0.6164.21 version.
I used the "hosts" parameter of mssql.yml to connect to both 12.0.6024.0 and 12.0.6372.1 using an sql server connection string.
After a while restarting the module manually, it looks like the first one still successfully connected while the second one still "crashing" after a moment.
When I try :

    USE [DB];
    SELECT * FROM sys.dm_db_log_space_usage;

I can't see any difference between each table structure.

Weird thing :
The green part is from the first DB server and the grey one from the second. I left Metricbeat running on his own for a month without changing anything (same for DB servers). It seams that the grey part just poped again two weeks ago... (And that for all DB from the second server)
image
I checked and we haven't proceeded any update.
Zooming in we can notice a tiny amount of data before it actually works again...
image
Nothing append on sqlServer logs or Winlogs during that period.

I also checked that we got enough disk space on the ELK server

So for the moment it works fine but I still don't know what append to our data...

Best regards,

Dorian

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.