Parse json data from log file into Kibana via Filebeat and Logstash

I am using Filebeat and Logstash for parsing json log file into Kibana.

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /home/tiennd/filebeat/logstash/*.json
  json.keys_under_root: true

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - decode_json_fields:
     fields: ["inner"]

Json log file's content:

{ "outer": "value", "inner": "{\"data\": \"value\"}" }

I try to use Logstash Pipelines but this did not work.
My error: Provided Grok expressions do not match field value: [{ \"outer\": \"value\", \"inner\": \"{\\\"data\\\": \\\"value\\\"}\" }]
How can i setup json parsing for filebeat?
Thank everyone!

Hi @theFatCat, welcome to the Elastic community forums!

A few questions to get us started:

  1. What version of Filebeat and Logstash are you using?

  2. Exactly where are you seeing that error? In which log file or from which process's output?

  3. Can you post your complete filebeat.yml please? Please make sure to redact any sensitive information from it before posting.

  4. Can you post your Logstash pipeline please? Please make sure to redact any sensitive information from it before posting.

Thanks,

Shaunak

  1. My Filebeat's version is 7.6.2 (amd64).
  2. I see errors from Kibana Dashboard on Elasticsearch Cloud. It's in error.message field.
  3. My filebeat.yml located in /etc/filebeat:
filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /home/tiennd/filebeat/logstash/*.json
  json.keys_under_root: true

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
    level: info
    review: 1

  ### Multiline options

  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after


#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging


#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

#============================= Elastic Cloud ==================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
cloud.id: "my-cloud-id"


# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `<user>:<pass>`.
cloud.auth: "my-cloud-auth"

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - decode_json_fields:
     fields: ["inner"]
  1. Type of old data is plain text, so i try use Logstash Piplines in Elasticsearch Cloud, but it did not work. So i convert my data into json.
    Example:
input { 
  file {
   path => "/home/tiennd/filebeat/logstash/*.log"
  }
 }
 filter {
   grok {
     match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
   }
}

Many thanks!

Thanks for posting the configurations. Looking at them, I'm not following the relationship between Filebeat and Logstash. Could you explain how the two are related to each other, please?

Why is there no output section in your Logstash pipeline? Are you sending the output to STDOUT? Can you post some sample output here? Also please post a couple lines from one of the /home/tiennd/filebeat/logstash/*.log files.

What is creating the *.json files under /home/tiennd/filebeat/logstash/? Can you post a couple of lines from one of these file here please?

Sorry for late response.
I enabled logstash via command line: sudo filebeat modules enable logstash and result is Module logstash is already enabled.
My json file in logstash's input folder:

{ "outer": "value", "inner": "{\"data\": \"value\"}" }
{ outer: value, inner: {"data": "value"} }
{ "outer": "value", "inner": "{\"data\": \"value\"}" }
{ "outer": "value", "inner": "{\"data\": \"value\"}" }
{ "outer": "value", "inner": "{\"data\": \"value\"}" }

My old data's type is plain text, I placed it into .log file, but now my data's type is json.
I do not need to print out data, so i remove output section in Logstash pipeline. But the important thing is that I do not see them connected. Can you show me the fastest way to extract information from a local log file into Kibana Dashboard?

The logstash module in Filebeat is intended for ingesting logs about a running Logstash node. I don't think this is what you want in your case.

I'd like to take a step back at this point and check some of my assumptions about what you are trying to achieve.

You have some log files in /home/tiennd/filebeat/logstash/. You want to ingest these into Elasticsearch so you can then visualize/analyze the logs in Kibana.

Some of the log files in that folder are older ones, which have the .log extension. Are you trying to ingest these or not? If you are, could you please post a couple of sample log lines from these files?

Some of the other log files in the same folder are newer ones, which have the .json extension. You want to ingest these. Could you please post a couple of sample log lines from these files?

In general, if I'm understanding your use case correctly above, you aren't going to need Logstash at all. You also don't want the logstash module of Filebeat (as explained earlier).

All you need is Filebeat with an input section for ingesting the .json log files and potentially another input section for ingesting the .log files. If you answer my questions above I can help you construct these input sections.

Shaunak

My content in .log file:

04/20/2020, 5:44:47 PM   Room X has 3 people
04/20/2020, 5:44:51 PM   Room X has 5 people 
04/20/2020, 5:45:47 PM   Room X has 7 people 
04/20/2020, 5:51:01 PM   Room X has 20 people 

Each process will write a separate log file in /home/tiennd/filebeat/logstash/. For example, RoomService will create a log file named /home/tiennd/filebeat/logstash/room.log.
I need to push the contents of those log files to Elasticsearch, then analyze them and visualize with Kibana. For example, create a chart of the room members whose id is X over time. Can we make it?

Thanks for posting the sample. Based on that, you should be able to ingest and parse this data using just Filebeat and Elasticsearch. Here is the broad outline of what you will need to do, along with links to relevant documentation:

  1. Create a Filebeat log input pointing to your logs folder. See https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html. This will tell Filebeat to watch that folder and periodically harvest new files as they appear.

  2. Setup the dissect processor in Filebeat to parse out various fields from each log entry to create a structured event/document. See https://www.elastic.co/guide/en/beats/filebeat/current/dissect.html.

  3. Use the elasticsearch Filebeat output to send your events/documents to Elasticsearch. See https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html.

Hope that helps,

Shaunak

1 Like

If I have multiple log files in the /tiennd/filebeat/logstash directory eg room.log, account.log. Each log file has a unique format, how can I extract data from these files into separate indexes?

You will need to create a separate input for each of these files. So your filebeat.yml configuration will contain something like this:

filebeat.inputs:
  - type: log
    path: /tiennd/filebeat/logstash/room.log
    index: room_data       # or whatever index you want room logs to go into
  - type: log
    path: /tiennd/filebeat/logstash/account.log
    index: account_data    # or whatever index you want account logs to go into

Each input can have it's own configuration, including index as shown above. See https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html for the configuration options available on the log input.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.