How to structure observability data about the wellness of my (custom) app

nouknouk · February 7, 2024, 3:56pm

Hi,

I have a (big, old) application which is monitored on production, to detect when something is mis-behaving and pro-actively notify operators.

The current monitoring is composed of a big script which executes a batch of 200 tests every 15 minutes (details below). Then an (HTML) report is generated and sent by mail.

I would like to export the results of those tests in my elastic stack.

Objectives are not fully defined for now ; it could be mid term:

storing a history of healthness of the application, keeping a trace of what happened and when.
generating KPIs
using alerting features provided by elastic, rather than sending mails inside the script.

my question:

As I'm free to choose the output generated by the script (number of 'events', their content, ...), I'm in front of (likely) a blank page and have choices to do:

Should I consider my data as a collection of events ? metrics ? a mix ?
What would be the (ECS) field(name)s i should/must output, in which indices ?
which approach should I choose for the data structure, and which best practises to follow (on top of ECS), in order to leverage the features already embedded inside the elastic stack (observability, alerting, ...) ?

Thanks in advance.

content of a 'test':

A 'test' is composed of :

a category, sub-category and a name
a severity level (major, blocking, system blocking, ...)
a final status of the test: "OK" / "FAIL", based on:
- the value returned by the test (one of a string, a date, a number) ...
- which is compared to a value of reference for this test ...
- using a comparison operator (<,>,=, in, ...) for this test.
I also have other metadata around each test (unique id for script execution, start / end timestamps, test execution duration, instance, ...)
And finally a few metadata related to the whole batch itself (uniqueID, instance, duration, ...)

My question: as I'm free to choose the structure and number of

Below, two examples of test data:

   {
      "run_id":"MONITOR_PREPROD_1707316974989",
      "instance_name":"PREPROD",
      "test_name":"current business date",
      "category":"Availability checks",
      "sub_category":"Dates",
      "test_severity":"3",
      "test_result":"FAIL",
      "test_returned_value":"02/01/2024",
      "test_expected_value":"07/02/2024",
      "test_eval_type":"=",
      "test_start_ts":1707316974996,
      "test_end_ts":1707316974999,
      "test_duration_ms":3
   }

   {
      "run_id":"MONITOR_PREPROD_1707316974989",
      "instance_name":"PREPROD",
      "test_name":"Messages pending in queue",
      "category":"Processing queues",
      "sub_category":"pending messages",
      "test_severity":"1",
      "test_returned_value":"2",
      "test_expected_value":"100",
      "test_eval_type":"<",
      "test_result":"OK",
      "test_start_ts":1707316975009,
      "test_end_ts":1707316975011,
      "test_duration_ms":2
   }

Topic		Replies	Views
Need help understand Observability elastic cloud agent configuration Metrics elastic-stack-monitoring , elastic-stack-alerting	0	19	February 26, 2025
Windows service monitoring options Kibana elastic-stack-monitoring	1	327	December 25, 2023
Pre-built/Community Rules for Monitoring/Observability? Metrics elastic-stack-alerting	3	457	February 21, 2022
How to structure logging to get the most out of built in ML Logs elastic-stack-machine-learning , ecs-elastic-common-schema	0	48	November 4, 2024
Query Alert History in Elastic Cloud 8.6.2 Kibana elastic-stack-alerting	4	657	April 18, 2023

How to structure observability data about the wellness of my (custom) app

my question:

content of a 'test':

Related topics