Detecting gaps in numeric range using ElasticSearch


#1

We have a cluster that serves data similar to that shown below:

{
    "identifier" : "A"
    "sequenceNumber" : 1
}

{
    "identifier" : "A"
    "sequenceNumber" : 2
}

{
    "identifier" : "A"
    "sequenceNumber" : 4
}

{
    "identifier" : "A"
    "sequenceNumber" : 8
}

What we would like to do is create a service that when given an identifier will return back any gaps. For instance if someone were to give us identifier A this service would return something similar to the following:

"gaps"  :  [
   {
      "lowerBound" : 2,
      "upperBound" : 4
  },
 {
      "lowerBound" : 4,
      "upperBound" : 8
  }
]

Original thought on how to accomplish this would be to get the maximum for a particular identifier and then use the range aggregations queries to see which sections had gaps. For instance for the above dataset the maximum would return back 8 and then the first round of range aggregations queries would looks similar to the following:

{
    "aggs" : {
        "identifier" : {
            "range" : {
                "field" : "sequenceNumber",
                "ranges" : [
                    { "to" : 4 },
                    { "from" : 5}
                ]
            }
        }
    }
}

"aggregations": {
        "identifier" : {
            "buckets": {
                "*-4": {
                    "to": 4,
                    "doc_count": 3
                },
                "5-*": {
                    "from": 5,
                    "doc_count": 1
                }
            }
        }
    }
}

Recursively calling each range that returned less documents the expected would then eventually reveal the gaps. My question is there any easier way to do this with ElasticSearch?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.