# Timelion heartbeat downtime count analysis

A spark-streaming data ingestion service sends a heartbeat every batch loop duration (in a field called `ingestion_alive`, with a value of 1). So, if the batch loop is 2 minutes, we expect to receive `30 (+-3)` heartbeat messages per hour.
What we would like to see is how many times the ingestion has failed and thus didn't send a heartbeat message. Let's say that during the last 24 hours, the ingestion failed 3 times. The first time it was offline for an hour, the second time just for 15 minutes, and the 3rd time for 3 hours straight. The question is, how to get the number of times the ingestion has failed?

I first thought I would sum the `ingestion_alive` field values per interval, then do cumulative sum over it (as seen in the picture), then apply the `derivative()` function over the cumulative sum. The result would be that the derivative would create a slope of zero over the time periods when there was no heartbeat. Then, I could find the number of derivative points which is zero. However, applying the `derivative()` function over the `cusum()` function only results in the original time-series with no functions applied. What am I doing wrong here?

``````        .es(index=default.gelf*,
timefield='@timestamp',
q='CF_APPLICATION_NAME:data-ingestion',
metric='sum:ingestion_alive').cusum()
``````

It took a while for me to realize, but derivative of cumulative sum is basically the graph without any aggregation.

Cumulative sum is basically: x1 + x2 + x3 (for bucket 3 for example)
Cumulative sum for bucket 4 would be: x1 + x2 + x3 + x4.
Derivative for bucket 4 would be Cumulative sum for bucket 4 minus Cumulative sum for bucket 3, which is x1 + x2 + x3 + x4 - (x1 + x2 + x3 ) = x4.

What I would do is plot the moving average of the count/sum and put a`.static` line on the chart with 30 as the value. This way you can see where it's below the line. (or make it 27 in order to account for variations +-)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.