Dec 19, 2022: [EN] Transform Your SLO Dashboards

Kibana is a useful tool for monitoring applications and services to ensure they are operating within specified service level objectives. Service level indicators (SLIs) are measurable aspects of a service, such as error codes and latency. Service level objectives (SLOs) define how an application or service is expected to perform as measured by the SLIs, and in a way, set service uptime and availability goals. As logging and metrics data generated by applications grow, so do the demands on the Elasticsearch cluster for processing aggregations over SLI data.

If you have ever assembled an SLO dashboard for a highly dense metrics dataset, you might already know how taxing SLI visualizations backed by millions of events can be on a cluster as each dashboard visualization performs one or more aggregations against the backing indices. One such aggregation, for example, might show the number of HTTP errors grouped by response code over a specified time interval. Another might aggregate proxy logs to show backend request latency over time.

Enter Elasticsearch Transforms. Transforms can be used to pre-aggregate SLI metrics, such as HTTP response codes, for SLO dashboards. Transforms query over existing indices then write summarized data to smaller indices that can be used by visualizations, allowing for fast retrieval of aggregated data without searching against the entire dataset.

Here, we will show you how to set up and use a transform using the sample web logs provided in Kibana. The following was performed on 8.5.2 of the Elastic Stack running in Elastic Cloud.

Loading The Sample Data

  1. Follow the Kibana Quick Start guide to add sample web logs data.
  2. Use Discover to gain some familiarity with the web log data and fields.

1. Configuration

We will be creating a pivot transform.

  1. Open the main menu, then Stack Management > Transforms > Create Transform.

  2. Choose Kibana Sample Data Logs as the data source.

  3. Be sure Pivot is selected.

  4. Set Group by to @timestamp. Click the pencil icon and set the Interval to 1h.

  5. Next, we will define the aggregations we want to execute and send to the transform destination index. Click Add an aggregation …, then type response to filter the selection box. Click filter(response).

  6. Fill in the filter aggregation property details provided in the table. Add a range query in the should boolean clause as shown below, then click Apply.

    Property Value
    Aggregation name response.2xx
    Field response.keyword
    Aggregation filter
    Filter query bool
    {
      "must": [],
      "must_not": [],
      "should": [
      {
        "range": {
          "response.keyword": {
            "gte": "200",
            "lt": "300"
          }
      }   
     ]
    }
    

  7. Continue adding three additional parent aggregations for 3xx, 4xx, and 5xx response status codes. Be sure to select Add an aggregation … for each group of status codes.

  8. Let's add one more aggregation for all response codes. Use a value count aggregation on the response.keyword field and name the aggregation response.total.

  9. With our five aggregations grouped by date, the transform preview should contain six fields. The preview shows a sample of the data that will be indexed to the destination transform index when the transform executes. If the preview looks good, click Next.

2. Transform Details

  1. Provide a name for the transform in the Transform ID box, an optional description, and a destination index.

  2. Click Next.

  3. Click Create and start.

Transform Status

The Transforms management page should show the transform as started. Click the arrow next to the transform ID, and select Stats to check its progress.

Visualizing

The aggregated transform data can now be used for visualizations. Open Discover and select the data view (aka, index pattern) for the transform destination index, then inspect a sample document. Be sure to set the time picker far enough back to view the data set.

The web server was not very busy during the 12:00 hour, only serving 6 requests. A single Lens visualization can show the SLO target.

Create A Visualization

Add a Lens visualization with the following configuration to a new or existing dashboard:

Configuration Values
Visualization type Lens/Area
Data view transform_sli_data_log_responses
Horizontal axis Functions: Date histogram
Field: @timestamp
Minimum interval: 1h
Drop partial intervals: disabled
Vertical axis (I) Success Rate > Data Method: Formula
Formula: (sum(response.2xx) + sum(response.3xx) + sum(response.4xx))/sum(response.total)
Appearance > Name: Success Rate
Value format: Percent
Vertical axis (II) Failure Rate > Data Method: Formula
Formula: sum(response.5xx)/sum(response.value_count)
Appearance > Name: Failure Rate
Value format: Percent
Reference lines transform_sli_data_log_responses
Vertical left axis > Method: Static value
Reference line value: 0.95
Icon decoration: Alert
Line: 2px
Color: #F70E0E
Left axis Axis title > Custom "Request Rate"

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.