Beginner here; how do I

end-user · December 21, 2021, 2:26am

I'm new to this stack ecosystem and I was hoping someone could point me to the elements that might fit my project. I have a command that I'd like to run on a schedule (like every 10 minutes) that generates some good metrics data that I'd like to store and graph over time. The data is in json format with a date stamp. But I'm drowning in all the "beats" and everything else; it's kind of overwhelming. I have Elasticsearch and kibana installed and running. What module do I need to configure to run the command and store the data? How do I use kibana to access the data and create a graph? I'm pretty sure I could write a query to show what I'm looking for - I just don't know how to do it in the tool.

warkolm · December 21, 2021, 2:54am

Welcome to our community!

Your best bet would be to use something like cron to run the script, store the output in a file, then use Filebeat to read that and send to Elasticsearch. Though because this is a custom source, you won't have a module that can handle it.

Once the data is in Elasticsearch, you can use the default filebeat-* pattern to view the data and create visualisations.

To get you used to the dashboarding, you can try enabling one of the modules (eg system) and then see how it does things, and copy them from there.

end-user · December 31, 2021, 1:49am

Ok, so I think I need a more detailed step-by-step. I think I've got filebeat installed. How do I configure it to pick up & parse files into Elasticsearch?

warkolm · December 31, 2021, 2:23am

Filebeat quick start: installation and configuration | Filebeat Reference [7.16] | Elastic is a great place to start.

end-user · January 7, 2022, 3:59am

Ok, so I'm to the point where I'm configuring inputs. As you noted, it's a custom format, stored as json. The section that talks about manually setting up an input seems to be sparse on info about how to parse a log file, though, and still directs me to use modules. How do I tell it to parse the data?

warkolm · January 9, 2022, 11:19pm

You'll want;

filestream input | Filebeat Reference [7.16] | Elastic
possibly filestream input | Filebeat Reference [7.16] | Elastic to parse the json
Configure the Elasticsearch output | Filebeat Reference [7.16] | Elastic to send to Elasticsearch

I would see how well it works with the ndjson parsing, and if it's not doing what you want then we can look at other options.

end-user · January 9, 2022, 11:59pm

Ok, since I've got to specify the format for parsing, is there a way to repeatedly run it and see the output; check the results?

warkolm · January 10, 2022, 1:46am

In that case use stdin and then cat the files in, it saves having to stuff around with the registry (that tracks processed files).
And then output via the console.

end-user · January 10, 2022, 2:24am

Sure, I can do that. Am I putting the config in /etc/filebeat/filebeat.yml and running /usr/share/filebeat/bin/filebeat? Just cat a sample record in, and it will output the results.

Can you also help me understand how this works? Am I just storing the log data in Elasticsearch? How/where do I write a query to show aggregation?

warkolm · January 10, 2022, 4:23am

That should be sufficient, yes.

Not yet, you're just testing your config and the parsing of things. When you use the Elasticsearch output, then yes it will be stored there, and then you can query with Kibana.

end-user · January 12, 2022, 2:57am

I'm not getting output, but I think it's because the logs are stored one per file, and the the json is multiple lines. How do I configure it to process the whole file as one "record"?

warkolm · January 12, 2022, 3:07am

It'd be great if you could share your config, as well as a log sample. Providing that level of detail makes things a lot easier to help you.

end-user · January 13, 2022, 2:29am

Right, makes sense. Here's a sample record, and where I am with the config. I appreciate the help.

warkolm · January 13, 2022, 5:09am

In your config, line 25 needs to be true or else it won't use stdin, then comment out line 28. eg as follows, and I commented out the Kibana setup stuff just for ease.

filebeat.yml

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: stdin

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # paths:
    #- /var/log/speedtest/2022-01-06\ 22:00:01.log
    #- /var/log/speedtest/*.log
    #- c:\programdata\elasticsearch\logs\*
  json.keys_under_root: true
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false


# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
# setup.kibana:
  # host: "localhost:5601"
  # username: "elastic"
  # password: "password"
  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

output.console:
  pretty: true

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

Then, with the json in the raw file you cat filename | filebeat -e path/to/filebeat.yml, and you should see output.

end-user · January 13, 2022, 3:14pm

Right, I'm getting output. I'm not sure how/why enabled was set to false. I also note that the command stays open despite cat having finished, but that probably won't matter once it's changed back to filestream.

It appears that it's not really happy parsing the file, I presume it's because the data is multiline. So, I stripped the newlines out, but then it wouldn't complete without a final one to end the record. Once I sorted that out, it looks like the result is a merged version of the json data with additional Elasticsearch properties. I assume this is what I'm looking for and will send to to Elasticsearch with output.elasticsearch:?

Is there a way to tell filebeat to expect and parse a multiline file? Or is there a way to preprocess it to strip the \ns? I don't know if I can alter the way the records are generated.

warkolm · January 18, 2022, 12:44am

Exactly.

Yep.

There is Manage multiline messages | Filebeat Reference [8.11] | Elastic that might help.

end-user · January 20, 2022, 3:16am

It's telling me I need to specify a json.message_key. What should I use? Why would it be required for multiline vs if I stripped the \n out?

end-user · January 22, 2022, 1:44am

So, I took a leap and assigned an arbitrary message_key thinking that it meant it needed something to set the parsed data to. However, now I'm getting the whole escaped json packet stuffed under the "message" property. I doubt that's what I'm looking for, right? I need the json parsed in order to send it to Elasticsearch for storing?

I think the multiline processing isn't setup to both consume a multiline, one-record-per-file log and parse it as json. I think I had better results just stripping the /n out before ingesting the file. The problem is, if I use the filestream consumer, is there a way I can pre-process the file beforehand? Like pipe the file through a bash command first?

Another option is I could run all the files through a processor before filebeat is called. Is there a way to archive, delete or rotate the logs that have been processed?

warkolm · January 24, 2022, 3:27am

We'd need to see an example of your raw json event to suggest what to use for message_key. Let's start there.

end-user · January 24, 2022, 4:16am

I'm not sure I understand. I shared the output that I get. Are you asking what command I run?

Topic		Replies	Views
Filebeat indexing on Windows Beats filebeat	4	667	July 18, 2017
Get dataset into Elastic Beats filebeat	5	280	March 22, 2024
Need help to setup filebeat Beats filebeat	9	812	April 11, 2018
Newbie and need help Beats filebeat	2	439	October 29, 2017
Problem with filebeat Beats filebeat	10	2282	August 2, 2017

Beginner here; how do I

Related topics