Dec 23rd, 2024: [EN] Building a Gift Production Monitoring System with Elastic Stack

This post is also available in portuguese.

The Challenge at the North Pole

"Ho ho... oh no!" exclaimed Santa Claus on a frosty morning. "There are only two days until Christmas! How can I keep track of the wishes of billions of children without proper monitoring?"

For centuries, Santa's workshop relied on magic dust and elf intuition to track toy production. But in 2024, even the North Pole needs a tech upgrade! Today, we’ll help Santa modernize his factory by implementing a real-time monitoring solution with the Elastic Stack.

Requirements

Mrs. Claus was clear about what we need:

  • Track toy production across all factory lines
  • Monitor elf performance (keeping the Christmas spirit high!)
  • Ensure quality control (no child should receive broken toys!)
  • Monitor factory conditions (elves need ideal temperatures for their hot cocoa!)

Magical Prerequisites

Before we start enchanting the factory, make sure you have:

  • Python 3.8+ (tested by elves, approved by Santa)
  • Access to Elastic Cloud (or local Elastic Stack 8.16)

Factory Structure

First, let’s organize our factory like Santa organizes his gift list. Clone the repository from GitHub:

git clone https://github.com/salgado/santa_advent_monitoring.git 
cd santa_advent_monitoring

By the end, your folder structure will look like this:

santa_advent_monitoring/
├── manage_simulation.sh
├── toy_workshop_simulator.py
│   
├── var/
│   └── log/                      
│       └── workshop/
│           └── production/
└── README.md

The Factory Example

We created a Python script that simulates Santa’s workshop, generating JSON logs with toy production metrics. The simulation is managed by a Bash script, responsible for starting, stopping, and monitoring production.

The Python code generates realistic data on toy types, production rates, quality scores, and environmental conditions. The Bash script manages the simulation lifecycle, rotates logs, and provides status updates. Think of it as a miniature version of a real production line—but instead of physical toys, it produces structured data to feed the Elastic Stack in real time.

Factory JSON Example

{
  "@timestamp": "2024-12-11T16:54:36.980882",
  "toy_type": "robot",
  "production_line": "line1",
  "production_rate": 101,
  "quality_score": 95,
  "elf_id": "elf_20",
  "errors_detected": 0,
  "temperature": 22.8,
  "humidity": 47.3,
  "shift": "evening",
  "machine_status": "normal",
  "toys_completed": 25,
  "version": "8.16.1"
}

This log contains production information: toy type, production line, rate, quality, environmental conditions, and more.

Field Descriptions

Field Description Examples Importance
toy_type Type of toy produced robot, doll, car, puzzle, board_game Defines complexity and production rate
production_line Production line line1, line2, line3 Locates production origin
quality_score Quality score 0-100 Critical: must be ≥ 90
elf_id Elf identifier elf_1 to elf_20 Tracks performance
errors_detected Errors detected 0-5 Critical if > 3
machine_status Machine status normal, error Indicates production health
toys_completed Completed units 15-50 Measures productivity

Initial Factory Configuration

configs/simulator/production_config.yml:

toy_types:
  robot:
    base_rate: 100
    complexity: 0.8
    min_quality: 85
    components: ["circuit_board", "motors", "sensors"]
  doll:
    base_rate: 150
    complexity: 0.6
    min_quality: 88
    components: ["fabric", "stuffing", "clothes"]
  # ... more toy types ...

production_lines:
  line1:
    efficiency: 0.95
    error_rate: 0.02
    maintenance_schedule: "0 */4 * * *"
  # ... more production lines ...

Implementing the Factory Simulator

Let’s get Santa’s factory up and running! :santa:

The Production Simulator

Ensure you’re in the correct directory:

cd santa_advent_monitoring

Management Script

manage_simulation.sh

Starting the Simulation

To start toy production:

./manage_simulation.sh start

This starts the simulator in the background, creates the log directory structure, and runs toy_workshop_simulator.py, which generates JSON events simulating production. The process PID is saved in simulator.pid for future management, and all logs go to toys.log. We’re powering up our virtual factory! :santa::factory:

Connecting Our Factory to the Cloud :cloud:

With the simulator running, let’s connect it to Elastic Cloud using the Elastic Agent.

Elastic Agent Installation

Follow the instructions in Kibana under Fleet → Agents to install the Elastic Agent for your environment.

Configure the factory log path:

/your-full-path/santa_advent_monitoring/var/log/workshop/production/toys.log

Note: At the end of the installation, selecting the "Enroll in Fleet" option will also set up the Fleet Server.

Configuring Factory Log Collection :christmas_tree:

With logs being generated, we need to configure the Elastic Agent to collect them. Let’s adjust the "Custom Logs" integration.

Navigating to the Configuration

  1. In Kibana, go to Fleet
  2. Under Agent policies, select Agent policy 1
  3. Click Add integration
  4. Search for "Custom logs" and select it

Adjusting the Integration

1. Basic Settings

  • Integration name: santa_workshop_logs
  • Description: Santa's Workshop Pro

2. Log Path

/Users/username/your_path/santa_advent_monitoring/var/log/workshop/production/toys.log

3. Advanced Configurations

In "Custom configurations":

json:
  keys_under_root: true
  add_error_key: true
  overwrite_keys: true
  decode_json_fields:
    fields: ["message"]
    target: ""
    process_array: false

Key points:

  • json.keys_under_root: Allows fields like @timestamp, toy_type, production_rate to appear at the root level for easier searches.
  • json.add_error_key: Adds an error field if parsing issues occur, helping identify malformed entries.
  • json.overwrite_keys: In case of duplicate keys, the latest one prevails. Maintains data consistency.

4. Dataset

  • Dataset name: "generic" (or "santa_workshop" if preferred)
  • Namespace: "default"

5. Finalizing

  • Under "Where to add this integration?", confirm that Agent policy 1 is selected
  • Click Save integration

Verifying the Configuration

After saving, wait a few moments. In Kibana, go to Discover and look for the logs-* index. You should see events arriving, showing toy types, production rates, quality, and more.

Building Santa’s Command Center :christmas_tree:

Let’s create a dashboard that will impress even the most technical elf.

Initial Data View Setup

  1. In Stack Management → Data Views → Create data view:
    • Name: santa-workshop
    • Index pattern: logs-*
    • Timestamp field: @timestamp

Creating the Dashboard

  1. In Kibana, go to Dashboards
  2. Click Create dashboard

Creating Visualizations

Use Create visualization for each one:

1. Total Toys Produced (Metric)

  • Metric: Sum of toys_completed
  • Title: "Total Toys Produced"

2. Quality Score (Metric)

  • Metric: Average of quality_score
  • Title: "Quality Score"
  • Format: Percentage

3. Average Production Rate (Metric)

  • Metric: Average of production_rate
  • Title: "Average Production Rate (per hour)"

4. Distribution by Toy Type (Pie)

  • Metric: Count
  • Split slices: toy_type
  • Title: "Production by Toy Type"

5. Line Performance (Vertical Bar)

  • Y-axis: production_rate
  • X-axis: production_line
  • Title: "Production Line Performance"

6. Environmental Conditions (Line)

  • Y-axis: temperature and humidity
  • X-axis: @timestamp
  • Title: "Workshop Environmental Conditions"

Dashboard Layout

Organize visualizations:

  • First row: Key metrics (Total, Quality, Rate)
  • Second row: Toy type distribution and Line performance
  • Third row: Environmental conditions

Resize as needed. The goal is to make the dashboard easy to interpret at a glance.

Proactive Monitoring with SLOs and Alerts

Ho ho ho! Let’s ensure quality remains high! Let’s configure a Service Level Objective (SLO).

Creating a Quality SLO

  1. In Observability → SLOs:

    • Click Create SLO
  2. SLI Definition:

    • SLI type: Custom Query
    • Data view: logs-*
    • Timestamp field: @timestamp
    • Query filter: machine_status:"normal"
    • Good query: quality_score >= 90
    • Total query: quality_score:*
    • Group by: production_line
  3. Objectives:

    • Time window: Rolling
    • Duration: 30 days
    • Budgeting method: Occurrences
    • Target / SLO (%): 94
  4. Description:

    • Name: "Quality Control SLO"
    • Description: "Monitoring production quality (goal: 94% with score ≥ 90)"
    • Tags: quality, production, christmas-2024

Setting Up Alerts

We’ll set clear goals:

  • Gift Quality SLO:

    • Target: 99% ≥ 90 quality
    • 24-hour window
    • 30-day evaluation
  • Production Efficiency SLO:

    • Target: ≥ 95% of planned
    • Real-time monitoring

Creating an Alert Rule

  1. Observability → Alerts → Create rule

  2. Type: "SLO burn rate"

  3. Alert Configuration:

    • Name: "Low Quality Alert"
    • SLO: select the created SLO
    • Alert if it falls below the goal
    • Frequency: every 5 minutes
    • Notifications: Email to the Head Elf, production Slack channel, or webhook for tickets
  4. Alert Message:

    • Title: ":christmas_tree: Production Quality Alert"
    • Message: "Attention elves! Line {{production_line}} is below the quality target ({{current_value}}%). Check immediately!"

Example Alert in Action

{
  "@timestamp": "2024-12-11T16:54:40.999778",
  "toy_type": "puzzle",
  "production_line": "line1",
  "production_rate": 103,
  "quality_score": 69,
  "elf_id": "elf_4",
  "errors_detected": 5,
  "temperature": 20.8,
  "humidity": 50.9,
  "shift": "evening",
  "machine_status": "error",
  "toys_completed": 25,
  "version": "8.16.1"
}

In this case, quality dropped. The alert will help quickly fix the issue.

Configuring Alerts via Elasticsearch Query

  1. Stack Management → Rules → Create rule
  2. Type: "Elasticsearch query"
  3. Query:
    {
      "bool": {
        "must": [
          {
            "range": {
              "quality_score": {
                "lt": 90
              }
            }
          }
        ]
      }
    }
    
  4. Parameters:
    • Name: "Gift Quality Alert"
    • Indices: logs-*
    • Schedule: Every 5 minutes
    • Actions: Notify supervisors (email/Slack)

SLO Panel

Our SLO panel will show quality trends, maintaining the Christmas spirit and children’s happiness.

Conclusion: A Modern and Magical Workshop! :christmas_tree::sparkles:

Our complete solution provides:

  1. Data collection for production :white_check_mark:
  2. Elastic Agent setup :white_check_mark:
  3. Visualization dashboard :white_check_mark:
  4. Proactive SLOs and alerts :white_check_mark:

Santa’s workshop now combines magic and technology! “This system has transformed gift production,” celebrates Bernard, Chief Operations Elf. “Now we ensure every child gets their perfect gift on time!”

Additional Resources

Remember: The best gifts arrive at the right time, and the best monitoring system makes that possible! :gift::sparkles:

#ElasticAdvent #ModernSanta #ObservabilityMagic #ElasticsearchForElves #Elastic

1 Like