Filebeat incorrect metrics

Hello! I'm testing Filebeat when Logstash is not available, but the filebeat metrics I got from HTTP endpoint don't match what I configured. There is no documentations to describe fields in the metrics, so I'm not sure if I undertood each term correctly. Let me know if I misunderstood any.

Setup:
version: Logstash 6.5.4, Filebeat 6.5.4.
Logstash is not running, so output is blocked for filebeat.

  1. The queue.mem.events = 64, but pipeline.active = 85, pipiline.total=171. Should the total be at most 64? Didn't filebeat stop reading log files when the internal memory is full?
  2. Why doesn filebeat.events.done=2? The output is blocked. There shouldn't be any events done.
  3. What's the difference between active and published events?
  4. When I query internal metrics from HTTP endpoint, does it run a one-time script to get the metrics or is it stored somewhere in the server? If it is stored on a permenant place, is it space efficient? Is the metrics calculated as delta value based on the last metrics query period or entire running state since filebeat is run?
  5. Because filebeat is installed on application server, is there any ways to get internal metrics without enabling HTTP?
  6. When the output is blocked, will fileabeat cause a lot of CPU usage?

I couldn't find much documents related to my questions above. Any help or explaination will be appreciated! Thanks!

# enable http endpoint to monitor internal state of filebeat
http.enabled: true


# internal queue
queue:
  mem:
    events: 64
    flush.min_events: 2
    flush.timeout: 5s


filebeat.inputs:
- type: log
  exclude_files: ['\.gz$']
  close_renamed: true
  clean_removed: true
  harvester_limit: 10
  scan_frequency: 20s

  # default close_inactive: 5s, will open a new file handler when a file is modified
  # clean_removed is enbaled by default
  paths:
      - /logs
  multiline.pattern: 'START:'
  multiline.negate: true
  multiline.match: after
  multiline.flush_pattern: 'END:'

processors:
  - drop_event:
      when:
        not:
           contains:
              message: "START"

  "filebeat": {
    "events": {
      "active": 169,
      "added": 171,
      "done": 2
    },
    "harvester": {
      "closed": 0,
      "open_files": 1,
      "running": 1,
      "skipped": 0,
      "started": 1
    },
    "input": {
      "log": {
        "files": {
          "renamed": 0,
          "truncated": 0
        }
      }
    }
  },
  "libbeat": {
    "config": {
      "module": {
        "running": 0,
        "starts": 0,
        "stops": 0
      },
      "reloads": 0
    },
    "output": {
      "events": {
        "acked": 0,
        "active": 0,
        "batches": 0,
        "dropped": 0,
        "duplicates": 0,
        "failed": 0,
        "total": 0
      },
      "read": {
        "bytes": 0,
        "errors": 0
      },
      "type": "logstash",
      "write": {
        "bytes": 0,
        "errors": 0
      }
    },
    "pipeline": {
      "clients": 1,
      "events": {
        "active": 85,
        "dropped": 0,
        "failed": 0,
        "filtered": 86,
        "published": 84,
        "retry": 2,
        "total": 171
      },
      "queue": {
        "acked": 0
      }
    }
  },

The queue.mem.events = 64, but pipeline.active = 85, pipiline.total=171. Should the total be at most 64? Didn't filebeat stop reading log files when the internal memory is full?

It will stop at 64, but inputs will create events and then will be blocked to pushing to the queue, so theses events are part of the numbers.

Why doesn filebeat.events.done=2? The output is blocked. There shouldn't be any events done. What's the difference between active and published events?

Active events in memory/transit to the queue.
Published events send outside of filebeat to ES.

When I query internal metrics from HTTP endpoint, does it run a one-time script to get the metrics or is it stored somewhere in the server? If it is stored on a permenant place, is it space efficient? Is the metrics calculated as delta value based on the last metrics query period or entire running state since filebeat is run? Because filebeat is installed on application server, is there any ways to get internal metrics without enabling HTTP?

There are mostly counters values so in that sense they are space efficient as int can be :slight_smile:
I think the http endpoint is more realtime with the values, but we also report the metric snapshots in the logs every 30 seconds.

When the output is blocked, will fileabeat cause a lot of CPU usage?

This should not be the case, FB will stop reading files and will try to reconnect to ES with backoff, so It should not use too much CPU.

Hello! Thanks for you detailed response! There are still some metrics that seems incorrect to me.

filebeat.yml

filebeat.inputs:
- type: log
  exclude_files: ['\.gz$']
  scan_frequency: 1m

  multiline.pattern: '^\[[^\]]+\] START:'
  multiline.negate: true
  multiline.match: after
  multiline.flush_pattern: 'END:'

processors:
  - drop_event:
      when:
        not:
          regexp:
            message: '^\[[^\]]+\] START:'

Metrics:

  "filebeat": {
    "events": {
      "active": 0,
      "added": 1068,
      "done": 1068
    },
    "harvester": {
      "closed": 0,
      "open_files": 1,
      "running": 1,
      "skipped": 0,
      "started": 1
    },
    "input": {
      "log": {
        "files": {
          "renamed": 0,
          "truncated": 0
        }
      },
      "netflow": {
        "flows": 0,
        "packets": {
          "dropped": 0,
          "received": 0
        }
      }
    }
  },
  "libbeat": {
    "config": {
      "module": {
        "running": 0,
        "starts": 0,
        "stops": 0
      },
      "reloads": 0
    },
    "output": {
      "events": {
        "acked": 533,
        "active": 0,
        "batches": 2,
        "dropped": 0,
        "duplicates": 0,
        "failed": 0,
        "total": 533
      },
      "read": {
        "bytes": 12,
        "errors": 0
      },
      "type": "logstash",
      "write": {
        "bytes": 77038,
        "errors": 0
      }
    },
    "pipeline": {
      "clients": 1,
      "events": {
        "active": 0,
        "dropped": 0,
        "failed": 0,
        "filtered": 535,
        "published": 533,
        "retry": 512,
        "total": 1068
      },
      "queue": {
        "acked": 533
      }
    }
  },

  1. in pipeline.events, it shows that filtered=535, published=533. Why is there two extra events filtered?
  2. In filebeat.yml, I used a drop-event processor and tested that if I keep the processor, there are less events received by Logstash. Therefore, there must be some events dropped by filebeat. But pipeline.event.dropped = 0, I don't think the metric is correct. Any ideas?
  3. Is the pipeline.events.total just the total of filtered events and published events?

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.