Mean Time Between Documents


(John Duquette) #1

I am trying to determine the meantime between records, per month, for a set of documents. All documents have a timestamp, and I want to capture the average time between the timestamps on a monthly basis. Have tried several trial and error approaches, but nothing is giving me exactly what I want.

Doc 1:
{
"_index": "my_index",
"_type": "CO",
"_source": {
"timestamp": "2018-01-02T14:00:00",
"segment": "none",
"category": "Normal",
"plannedStart": "2018-01-02T14:00:00",
"subCategory": "Normal"
}
}

Doc 2:
{
"_index": "my_index",
"_type": "CO",
"_source": {
"timestamp": "2018-01-05T14:00:00",
"segment": "none",
"category": "Normal",
"plannedStart": "2018-01-05T14:00:00",
"subCategory": "Normal"
}
}

serial_diff seems to get me the closest to getting the deltas between the docs, but how to take the next step to aggregate the average of the deltas?
"aggs": {
"2": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"time_zone": "America/New_York",
"min_doc_count": 1
},
"aggs": {
"start": {
"sum":{
"field": "timestamp"
}
},
"delta": {
"serial_diff":{
"buckets_path":"start",
"lag": 1
}
}
}
}
}


(Igor Motov) #2

Could you define what you mean by "mean time between documents"? The way I understand it if you get documents that arrive in momentst[0], t[1], t[2].... t[N-1], t[N] you want to find ((t[1]-t[0])+(t[2]-t[1])+ ... + (t[N]-t[N-1]))/N?


(John Duquette) #3

That is exactly right. Sum of all deltas over total document count: Σ[(t[N] - t[N-1])]/N


(Igor Motov) #4

In this case isn't it the same thing as

(t[1]-t[0]+t[2]-t[1]+ ... + t[N]-t[N-1])/N = (t[N] - t[0])/N

Which, assuming a pretty frequent arrival of events within time period T, basically assuming that the first event arrived pretty close to the beginning of period and the last event arrived pretty close to the end of the period, we can say that (t[N] - t[0]) ≈ T, especially if T and N are pretty large. Then the average time between events will be just T / N. You know T - it's 1 month in your case and you can get N by just running date histogram on your documents. Am I missing something?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.