A spark-streaming data ingestion service sends a heartbeat every batch loop duration (in a field called `ingestion_alive`

, with a value of 1). So, if the batch loop is 2 minutes, we expect to receive `30 (+-3)`

heartbeat messages per hour.

What we would like to see is how many times the ingestion has failed and thus didn't send a heartbeat message. Let's say that during the last 24 hours, the ingestion failed 3 times. The first time it was offline for an hour, the second time just for 15 minutes, and the 3rd time for 3 hours straight. The question is, how to get the number of times the ingestion has failed?

I first thought I would sum the `ingestion_alive`

field values per interval, then do cumulative sum over it (as seen in the picture), then apply the `derivative()`

function over the cumulative sum. The result would be that the derivative would create a slope of zero over the time periods when there was no heartbeat. Then, I could find the number of derivative points which is zero. However, applying the `derivative()`

function over the `cusum()`

function only results in the original time-series with no functions applied. What am I doing wrong here?

```
.es(index=default.gelf*,
timefield='@timestamp',
q='CF_APPLICATION_NAME:data-ingestion',
metric='sum:ingestion_alive').cusum()
```