I would like to ask for a high-level advice how to approach the following problem (we have on-premise Elastic 8.8.0).
Periodically every 30 seconds, the following data about status of a resource is ingested into Elastic via Elastic Agent HTTP endpoint > Logstash > Elasticsearch.
The measurement data looks like:
...
{"timestamp": "2023-06-30T19:29:00Z", "id": "resource-id-1", "status": "READY"}
{"timestamp": "2023-06-30T19:29:30Z", "id": "resource-id-1", "status": "READY"}
{"timestamp": "2023-06-30T19:30:00Z", "id": "resource-id-1", "status": "READY"}
{"timestamp": "2023-06-30T19:30:30Z", "id": "resource-id-1", "status": "READY"}
{"timestamp": "2023-06-30T19:31:00Z", "id": "resource-id-1", "status": "READY"}
{"timestamp": "2023-06-30T19:31:30Z", "id": "resource-id-1", "status": "STEADY"}
{"timestamp": "2023-06-30T19:32:00Z", "id": "resource-id-1", "status": "STEADY"}
{"timestamp": "2023-06-30T19:32:30Z", "id": "resource-id-1", "status": "STEADY"}
{"timestamp": "2023-06-30T19:33:00Z", "id": "resource-id-1", "status": "STEADY"}
{"timestamp": "2023-06-30T19:33:30Z", "id": "resource-id-1", "status": "STEADY"}
{"timestamp": "2023-06-30T19:34:00Z", "id": "resource-id-1", "status": "GO"}
{"timestamp": "2023-06-30T19:34:30Z", "id": "resource-id-1", "status": "GO"}
{"timestamp": "2023-06-30T19:35:00Z", "id": "resource-id-1", "status": "GO"}
...
The resource is identified by its id
(there are handful of resources with different ids but not shown here for clarity).
My end goal is to compute the duration of the transition of every resource from READY
to GO
.
That is, for a particular resource, to compute the difference of timestamps between:
- the first
STEADY
afterREADY
- the first
GO
afterSTEADY
In this example, it would be 2023-06-30T19:34:00Z - 2023-06-30T19:31:30Z = 150 seconds
Obviously, a resource can transition in any timestamp and the duration can last from 30 seconds to 3600 seconds.
Once the resource is in GO
status, it can transition back to READY
(but I am not interested in measuring this duration).
What would be the most natural "Elastic" way of tackling this situation?
I was considering the following:
- Elastic transform (Transforming data | Elasticsearch Guide [8.8] | Elastic). However I am having a hard time to figure out how to filter for the right two events which mark the boundary of the transition. Once I would have the events marking the transition boundary, then computing the duration is easy with transform grouping+aggregation.
- Elastic ingest pipeline, perhaps with an enrich processor.
- Elastic ML job (just an idea, I have not looked into this).
- An external job scheduled by cron, for example, to post-process the data.
- Something else?
Please let me know your thoughts.