Beginner here; how do I

I'm new to this stack ecosystem and I was hoping someone could point me to the elements that might fit my project. I have a command that I'd like to run on a schedule (like every 10 minutes) that generates some good metrics data that I'd like to store and graph over time. The data is in json format with a date stamp. But I'm drowning in all the "beats" and everything else; it's kind of overwhelming. I have Elasticsearch and kibana installed and running. What module do I need to configure to run the command and store the data? How do I use kibana to access the data and create a graph? I'm pretty sure I could write a query to show what I'm looking for - I just don't know how to do it in the tool.

Welcome to our community! :smiley:

Your best bet would be to use something like cron to run the script, store the output in a file, then use Filebeat to read that and send to Elasticsearch. Though because this is a custom source, you won't have a module that can handle it.

Once the data is in Elasticsearch, you can use the default filebeat-* pattern to view the data and create visualisations.

To get you used to the dashboarding, you can try enabling one of the modules (eg system) and then see how it does things, and copy them from there.

Ok, so I think I need a more detailed step-by-step. I think I've got filebeat installed. How do I configure it to pick up & parse files into Elasticsearch?

Filebeat quick start: installation and configuration | Filebeat Reference [7.16] | Elastic is a great place to start.

Ok, so I'm to the point where I'm configuring inputs. As you noted, it's a custom format, stored as json. The section that talks about manually setting up an input seems to be sparse on info about how to parse a log file, though, and still directs me to use modules. How do I tell it to parse the data?

You'll want;

  1. filestream input | Filebeat Reference [7.16] | Elastic
  2. possibly filestream input | Filebeat Reference [7.16] | Elastic to parse the json
  3. Configure the Elasticsearch output | Filebeat Reference [7.16] | Elastic to send to Elasticsearch

I would see how well it works with the ndjson parsing, and if it's not doing what you want then we can look at other options.

Ok, since I've got to specify the format for parsing, is there a way to repeatedly run it and see the output; check the results?

In that case use stdin and then cat the files in, it saves having to stuff around with the registry (that tracks processed files).
And then output via the console.

Sure, I can do that. Am I putting the config in /etc/filebeat/filebeat.yml and running /usr/share/filebeat/bin/filebeat? Just cat a sample record in, and it will output the results.

Can you also help me understand how this works? Am I just storing the log data in Elasticsearch? How/where do I write a query to show aggregation?

That should be sufficient, yes.

Not yet, you're just testing your config and the parsing of things. When you use the Elasticsearch output, then yes it will be stored there, and then you can query with Kibana.

I'm not getting output, but I think it's because the logs are stored one per file, and the the json is multiple lines. How do I configure it to process the whole file as one "record"?

It'd be great if you could share your config, as well as a log sample. Providing that level of detail makes things a lot easier to help you.

Right, makes sense. Here's a sample record, and where I am with the config. I appreciate the help.

In your config, line 25 needs to be true or else it won't use stdin, then comment out line 28. eg as follows, and I commented out the Kibana setup stuff just for ease.

filebeat.yml
###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: stdin

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # paths:
    #- /var/log/speedtest/2022-01-06\ 22:00:01.log
    #- /var/log/speedtest/*.log
    #- c:\programdata\elasticsearch\logs\*
  json.keys_under_root: true
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
# setup.kibana:
  # host: "localhost:5601"
  # username: "elastic"
  # password: "password"
  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

output.console:
  pretty: true

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

Then, with the json in the raw file you cat filename | filebeat -e path/to/filebeat.yml, and you should see output.

Right, I'm getting output. I'm not sure how/why enabled was set to false. I also note that the command stays open despite cat having finished, but that probably won't matter once it's changed back to filestream.

It appears that it's not really happy parsing the file, I presume it's because the data is multiline. So, I stripped the newlines out, but then it wouldn't complete without a final one to end the record. Once I sorted that out, it looks like the result is a merged version of the json data with additional Elasticsearch properties. I assume this is what I'm looking for and will send to to Elasticsearch with output.elasticsearch:?

Is there a way to tell filebeat to expect and parse a multiline file? Or is there a way to preprocess it to strip the \ns? I don't know if I can alter the way the records are generated.

Exactly.

Yep.

There is Manage multiline messages | Filebeat Reference [7.16] | Elastic that might help.

It's telling me I need to specify a json.message_key. What should I use? Why would it be required for multiline vs if I stripped the \n out?

So, I took a leap and assigned an arbitrary message_key thinking that it meant it needed something to set the parsed data to. However, now I'm getting the whole escaped json packet stuffed under the "message" property. I doubt that's what I'm looking for, right? I need the json parsed in order to send it to Elasticsearch for storing?

I think the multiline processing isn't setup to both consume a multiline, one-record-per-file log and parse it as json. I think I had better results just stripping the /n out before ingesting the file. The problem is, if I use the filestream consumer, is there a way I can pre-process the file beforehand? Like pipe the file through a bash command first?

Another option is I could run all the files through a processor before filebeat is called. Is there a way to archive, delete or rotate the logs that have been processed?