Assistance with a query - used phone extensions

We have an application that will send, every minute, the hook status of several dozen phone extensions.

The data looks like this:
{
"mapping": {
"_doc": {
"properties": {
"date": {
"type": "date",
"format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis"
},
"extension": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"tag": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}

This means I've got thousands of rows for each extension, each time stamped, allowing me to go to a particular moment in time and determine if a particular extension was in use at that time.

Now, a requirement has come down to identify extensions that are NOT used. The data source will report (status field) ONLINE or OFFLINE with each record.

I'd like to come up with a query that will return all extensions that have only EVER been offline. something like (pseudocode, I'm new to ES) count(status:online) = 0 group by extension.

I've tried aggs and filters and have found myself stumped. I'd appreciate any help anyone might have!

I would use aggregations. Start with a terms aggregation to get a bucket for every extension. Next, use the cardinality aggregation to get the number of unique statuses per extension. This should be 1 or 2. Now, you can use the bucket_selector pipeline aggregation to select only those extension buckets with 1 unique status. Assuming that those extensions that only have one unique status all have the status OFFLINE, this will return the list of extensions you are looking for. The request would look like this:

{
  "size": 0,
  "aggs": {
    "extension": {
      "terms": {
        "field": "extension.keyword",
        "size": 1000
      },
      "aggs": {
        "unique_statuses": {
          "cardinality": {
            "field": "status.keyword"
          }
        },
        "bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "unique_statuses": "unique_statuses"
            },
            "script": "params.unique_statuses == 1"
          }
        }
      }
    }
  }
}

The size of the terms aggregation is 1000 in the request above. As a result, you will only get a list of up to 1000 extensions. If you want to retrieve a larger number of extensions, you can increase this number, but if the number is huge you could consider using the composite aggregation instead of the terms agg.

Now, if the assumption that extensions that only have one unique status have the status OFFLINE is incorrect (ie. you also have extensions that have a unique status ONLINE), then the approach would be a bit more complex. You could use a scripted metric aggregation that calculates a number depending on the statuses in a bucket (0 for OFFLINE and 1 for ONLINE). The bucket_selector script can then also check for that number to select only those extensions that have the unique status OFFLINE.

{
  "size": 0,
  "aggs": {
    "extension": {
      "terms": {
        "field": "extension.keyword",
        "size": 1000
      },
      "aggs": {
        "unique_statuses": {
          "cardinality": {
            "field": "status.keyword"
          }
        },
        "offline_or_online": {
          "scripted_metric": {
            "init_script": "params._agg.status = 0",
            "map_script": "params._agg.status = doc['status.keyword'].value == 'OFFLINE' ? 0 : 1",
            "combine_script": "return params._agg.status",
            "reduce_script": "return params._aggs[0]"
          }
        },
        "bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "unique_statuses": "unique_statuses",
              "offline_or_online": "offline_or_online.value"
            },
            "script": "params.unique_statuses == 1 && params.offline_or_online == 0"
          }
        }
      }
    }
  }
}
2 Likes

EXCELLENT solution. This is wonderful, and really gives me (as an ES novice) confidence that I've selected the right platform. Thank you so much for your quick and very comprehensive solution!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.