Jdbc log parsing


#1

Hi All,

I would like to parse a jdbc log and I'm just learning the process. The log looks like this:

40703|master|180615-09:36:22:315|7 ms|statement|UPDATE tasks SET p_runningonclusternode = ? WHERE p_runningonclusternode = ? AND PK = ? AND hjmpTS = ? |UPDATE tasks SET p_runningonclusternode = 0 WHERE p_runningonclusternode = -1 AND PK = 8796125856694 AND hjmpTS = 1076364
40703|master|180615-09:36:22:322|5 ms|statement|UPDATE taskconditions SET p_consumed = ? WHERE p_task = ?|UPDATE taskconditions SET p_consumed = 1 WHERE p_task = 8796125856694
40703|master|180615-09:36:22:328|1 ms|statement|SELECT item_t0.PK FROM tasks item_t0 WHERE ( item_t0.p_trigger = 8796125856246) AND (item_t0.TypePkString=? )|SELECT item_t0.PK FROM tasks item_t0 WHERE ( item_t0.p_trigger = 8796125856246) AND (item_t0.TypePkString=8796107178066 )
40703|master|180615-09:36:22:333|2 ms|statement|UPDATE triggerscj SET hjmpTS = ? ,modifiedTS=?,p_activationtime=? WHERE PK = ? AND (sealed IS NULL OR sealed=0)|UPDATE triggerscj SET hjmpTS = 538183 ,modifiedTS='2018-06-15 09:36:22.329',p_activationtime='2018-06-15 09:36:23.0' WHERE PK = 8796125856246 AND (sealed IS NULL OR sealed=0)
40703|master|180615-09:36:22:337|2 ms|statement|UPDATE tasks SET hjmpTS = ? ,modifiedTS=?,p_executiontimemillis=? WHERE PK = ? AND (sealed IS NULL OR sealed=0)|UPDATE tasks SET hjmpTS = 1076365 ,modifiedTS='2018-06-15 09:36:22.334',p_executiontimemillis=1529080583000 WHERE PK = 8796125856694 AND (sealed IS NULL OR sealed=0)
40703|master|180615-09:36:22:341|3 ms|commit||
40703|master|180615-09:36:22:344|1 ms|statement|SELECT * FROM triggerscj WHERE PK=?|SELECT * FROM triggerscj WHERE PK=8796125856246
40703|master|180615-09:36:22:347|1 ms|statement|SELECT * FROM tasks WHERE PK=?|SELECT * FROM tasks WHERE PK=8796125856694
40703|master|180615-09:36:22:354|5 ms|statement|UPDATE tasks SET hjmpTS = ? ,modifiedTS=?,p_retry=? WHERE PK = ? AND (sealed IS NULL OR sealed=0)|UPDATE tasks SET hjmpTS = 1076366 ,modifiedTS='2018-06-15 09:36:22.348',p_retry=538183 WHERE PK = 8796125856694 AND (sealed IS NULL OR sealed=0)
40703|master|180615-09:36:22:360|4 ms|statement|UPDATE tasks SET p_runningonclusternode = ? WHERE PK = ?|UPDATE tasks SET p_runningonclusternode = -1 WHERE PK = 8796125856694
40703|master|180615-09:36:22:364|2 ms|statement|DELETE FROM tasks_aux_queue WHERE PK IN (?)|DELETE FROM tasks_aux_queue WHERE PK IN ('8796125856694')
40703|master|180615-09:36:22:371|6 ms|commit||
61|master|180615-09:36:22:386|0 ms|statement|SELECT isInitialized FROM metainformations WHERE PK = ?|SELECT isInitialized FROM metainformations WHERE PK = 15481123719086096
61|master|180615-09:36:22:389|1 ms|statement|SELECT * FROM tasks WHERE PK=?|SELECT * FROM tasks WHERE PK=8796125856694
38|master|180615-09:36:22:791|4 ms|statement|DELETE FROM tasks_aux_workers WHERE ID = ? AND IS_ACTIVE = 0|DELETE FROM tasks_aux_workers WHERE ID = 0 AND IS_ACTIVE = 0

Looking at the fields it looks like they are:

PID | Node? | timestamp | duration | command type? | sql statement | additional sql statement?

Does anyone know how I can confirm what each field corresponds to? The first field appears like it could be PID but numbers less than 100 seem odd. Thoughts?

Any guidance on how I can get started on creating the logstash input block?

Thanks in advance..

HB


#2

Hi All,

I've gathered more information and am able to understand what this JDBC log is outputting. For a given log line:

79|master|180615-09:36:22:287|1 ms|statement|INSERT INTO tasks_aux_queue (PK, RANGE_VALUE, NODE_ID, NODE_GROUP, EXECUTION_TIME) SELECT * FROM (SELECT PK, floor(0 + (rand()(1000-0))) AS rangeCol, CASE WHEN p_nodeId IS NULL THEN -1 ELSE p_nodeId END AS nodeIdCol, CASE WHEN p_nodeGroup IS NULL THEN '---' ELSE p_nodeGroup END AS nodeGroupCol, p_expirationtimemillis/1000/60 AS execTimeCol FROM tasks WHERE p_failed = 0 AND p_expirationtimemillis <= ? AND p_runningonclusternode = -1) AS A WHERE NOT EXISTS (SELECT 1 FROM tasks_aux_queue AS B WHERE A.PK = B.PK)|INSERT INTO tasks_aux_queue (PK, RANGE_VALUE, NODE_ID, NODE_GROUP, EXECUTION_TIME) SELECT * FROM (SELECT PK, floor(0 + (rand()(1000-0))) AS rangeCol, CASE WHEN p_nodeId IS NULL THEN -1 ELSE p_nodeId END AS nodeIdCol, CASE WHEN p_nodeGroup IS NULL THEN '---' ELSE p_nodeGroup END AS nodeGroupCol, p_expirationtimemillis/1000/60 AS execTimeCol FROM tasks WHERE p_failed = 0 AND p_expirationtimemillis <= '1529080582279' AND p_runningonclusternode = -1) AS A WHERE NOT EXISTS (SELECT 1 FROM tasks_aux_queue AS B WHERE A.PK = B.PK)

Here are the fields:

thread id|datasource id|start time|elapsed time|category|prepared statement|sql statement

The LS parsing I'm attempting (and bare with me, I'm a novice with this) is:

grok {
match => { "message" => "%{NUMBER:thread_id}\|%{WORD:datasource_id}\|%{DATA:timestamp}\|%{NUMBER:duration} %{WORD:interval}\|%{WORD:category}\|%{GREEDYDATA:prepared_statement}\|%{GREEDYDATA:sql_statement}" }
}

date { match => ["timestamp", "yyMMdd-HH:mm:ss:SSS"] }

If I run the above log line in the Kibana grok debugger it appears to work fine:

{
"duration": "1",
"thread_id": "79",
"datasource_id": "master",
"prepared_statement": "INSERT INTO tasks_aux_queue (PK, RANGE_VALUE, NODE_ID, NODE_GROUP, EXECUTION_TIME) SELECT * FROM (SELECT PK, floor(0 + (rand()(1000-0))) AS rangeCol, CASE WHEN p_nodeId IS NULL THEN -1 ELSE p_nodeId END AS nodeIdCol, CASE WHEN p_nodeGroup IS NULL THEN '---' ELSE p_nodeGroup END AS nodeGroupCol, p_expirationtimemillis/1000/60 AS execTimeCol FROM tasks WHERE p_failed = 0 AND p_expirationtimemillis <= ? AND p_runningonclusternode = -1) AS A WHERE NOT EXISTS (SELECT 1 FROM tasks_aux_queue AS B WHERE A.PK = B.PK)",
"sql_statement": "INSERT INTO tasks_aux_queue (PK, RANGE_VALUE, NODE_ID, NODE_GROUP, EXECUTION_TIME) SELECT * FROM (SELECT PK, floor(0 + (rand()
(1000-0))) AS rangeCol, CASE WHEN p_nodeId IS NULL THEN -1 ELSE p_nodeId END AS nodeIdCol, CASE WHEN p_nodeGroup IS NULL THEN '---' ELSE p_nodeGroup END AS nodeGroupCol, p_expirationtimemillis/1000/60 AS execTimeCol FROM tasks WHERE p_failed = 0 AND p_expirationtimemillis <= '1529080582279' AND p_runningonclusternode = -1) AS A WHERE NOT EXISTS (SELECT 1 FROM tasks_aux_queue AS B WHERE A.PK = B.PK)",
"interval": "ms",
"category": "statement",
"timestamp": "180615-09:36:22:287"
}

It also runs fine with run it to stdout: output {stdout { codec => rubydebug }}.

However, running it through LS I it does not work properly. It gives _grokparsefailure tag.

I am pretty sure the problem is with the GREEDYDATA syntax but I'm not knowledgeable about how to fix it. The log file fields appear to be separated by pipes "|" with no spaces universally, can anyone assist with might be going on here?

TIA,

HB


#3

Is there anyone in the community that can point me in the right direction here?

Maybe just how to trouble shoot?

Thanks,

HB


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.