Hi,
I have a (big, old) application which is monitored on production, to detect when something is mis-behaving and pro-actively notify operators.
The current monitoring is composed of a big script which executes a batch of 200 tests every 15 minutes (details below). Then an (HTML) report is generated and sent by mail.
I would like to export the results of those tests in my elastic stack.
Objectives are not fully defined for now ; it could be mid term:
- storing a history of healthness of the application, keeping a trace of what happened and when.
- generating KPIs
- using alerting features provided by elastic, rather than sending mails inside the script.
my question:
As I'm free to choose the output generated by the script (number of 'events', their content, ...), I'm in front of (likely) a blank page and have choices to do:
-
Should I consider my data as a collection of events ? metrics ? a mix ?
What would be the (ECS) field(name)s i should/must output, in which indices ? -
which approach should I choose for the data structure, and which best practises to follow (on top of ECS), in order to leverage the features already embedded inside the elastic stack (observability, alerting, ...) ?
Thanks in advance.
content of a 'test':
A 'test' is composed of :
- a category, sub-category and a name
- a severity level (major, blocking, system blocking, ...)
- a final status of the test: "OK" / "FAIL", based on:
- the value returned by the test (one of a string, a date, a number) ...
- which is compared to a value of reference for this test ...
- using a comparison operator (<,>,=, in, ...) for this test.
- I also have other metadata around each test (unique id for script execution, start / end timestamps, test execution duration, instance, ...)
- And finally a few metadata related to the whole batch itself (uniqueID, instance, duration, ...)
My question: as I'm free to choose the structure and number of
Below, two examples of test data:
{
"run_id":"MONITOR_PREPROD_1707316974989",
"instance_name":"PREPROD",
"test_name":"current business date",
"category":"Availability checks",
"sub_category":"Dates",
"test_severity":"3",
"test_result":"FAIL",
"test_returned_value":"02/01/2024",
"test_expected_value":"07/02/2024",
"test_eval_type":"=",
"test_start_ts":1707316974996,
"test_end_ts":1707316974999,
"test_duration_ms":3
}
{
"run_id":"MONITOR_PREPROD_1707316974989",
"instance_name":"PREPROD",
"test_name":"Messages pending in queue",
"category":"Processing queues",
"sub_category":"pending messages",
"test_severity":"1",
"test_returned_value":"2",
"test_expected_value":"100",
"test_eval_type":"<",
"test_result":"OK",
"test_start_ts":1707316975009,
"test_end_ts":1707316975011,
"test_duration_ms":2
}