Custom logs ingest: how?

I have a very simple logfile in json format that I want to send to elasticsearch The Filebeat quick setup page is not clear on how to make this work.

This is an excerpt of the file to input. This is all the complexity it will ever have, all flattened, no objects, no nothing.

{
    "typography": "4.0.0-SNAPSHOT",
    "licensing-remote": "4.0.5-SNAPSHOT",
    "sign": "8.0.0-SNAPSHOT",
    "testType": "performance",
     ....
}

This is the filebeat.yml file

################### Filebeat Configuration #########################
# ============================== Filebeat inputs ===============================

filebeat.inputs:

  # filestream is an input for collecting log messages from files.
  - type: filestream

    # Change to true to enable this input configuration.
    enabled: true

    # Paths that should be crawled and fetched. Glob based paths.
    paths:
      - C:\\local_path\\elasticsearch*.log

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Set to true to enable config reloading
  reload.enabled: true

  # Period on which files under path should be checked for changes
  #reload.period: 10s
  reload.period: 10s

  # ======================= Elasticsearch template setting =======================

setup.template.enabled: true
setup.template.overwrite: true
setup.template.name: "performance-tests"
setup.template.pattern: "performance-functional-test*"
setup.ilm.enabled: false

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  hosts: ["http://localhost:9200"]

  # Optional index name. The default is "filebeat" plus date
  index: "performance-functional-test-%{+yyyy.MM.dd}"

Problem:
When running setup, a whole lot of (ECS) modules and fields are loaded by default that I can't seem to remove, even when I remove all folders, fields.yml and other references to the ECS system.

Anyone has a clear guide on how to configure Filebeat for a custom ingest pipeline?
After two weeks on the documentation I can tell you, it is not there.

Hi @Johannnnnn

I suspect your first issue is that your JSON is Pretty formatted / multi-line and filebeat is line oriented so it expects the JSON to be single line ndjson.

If your file actually looks like above, you're going to have to use a multiline parser or just simply convert it to newline delimited with jq here are some instructions ignore it for logstash, same concept

Once in ndjson see here

If you don't want to do that, then please provide an actual sample of your file with multiple entries. Then we'll have to construct a multi-line parser which is not always easy. If you were saying it is alwaysif you were saying it is always exactly the same number of lines You can do multi-line with just number of lines... Otherwise you have to construct a regex expression.

Second part I do not recommend trying to remove parts of filebeat to get rid of the fields. Just used to drop_field processor and drop the host agent, ECS fields, etc. People do that all the time. It's very common to clean up the output ... Quick and easy

Ah, looking in to it. Thank you for your excellent answer

Again, thank you for your reply, @stephenb .

All I want is this:

  • single line json ingest (check)
  • custom index name pattern
    If not necessary I don't really need an index template. I don't really care about that. Again, my problem is, there is a lot of writing about the options, but no use cases.

This is the abbreviated input json on one line (as it is now in the log file):

{"typography":"4.0.0-SNAPSHOT","licensing-remote":"4.0.5-SNAPSHOT","sign":"8.0.0-SNAPSHOT","testType":"performance","jenkinsBuild":"","operatingSystem":"Windows11","pdfocr-api":"3.0.0-SNAPSHOT","platform":"Java","commons":"8.0.0-SNAPSHOT","pdfxfa":"4.0.0-SNAPSHOT","testIssue":"QA-14649","platformVersion":"1.8.0_333-b02","pdf2data":"4.0.1-SNAPSHOT","pdfa":"8.0.0-SNAPSHOT","elapsedTime":"6082619500"}

This is my yml file

###################### Filebeat Configuration #########################

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Unique ID among all inputs, an ID is required.
  id: my-filestream-id

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - C:\\Projects\\logs\\elastic\\logs\\elasticsearch*.log
  parsers:
    - ndjson:
        target: ""

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 1
setup.template.enabled: false
setup.template.name: "performance-tests"
setup.template.pattern: "performance-functional-test*"
    
setup.ilm.enabled: false
# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  index: "performance-functional-test-%{+yyyy.MM.dd}"

These should be my fields.yml, although I am not sure I can define these in this fields.yml. But I cannot find any resource on this mechanism. If I don't want a index template, I should probably remove this?

- key: performance
  title: Performance test logs
  description: Fields for the performance functional testing logs
  fields:
  - name: '@timestamp'
    level: core
    required: true
    type: date
    description: 'Date/time when the event originated.

      This is the date/time extracted from the event, typically representing when
      the event was generated by the source.

      If the event source has no original timestamp, this value is typically populated
      by the first time the event was received by the pipeline.

      Required field for all events.'
    example: '2016-05-23T08:05:34.853Z'
    default_field: true
  - name: barcodes
    level: core
    type: keyword
    description: ' Barcodes version'
    example: '"barcodes": "x.y.z"'
  - name: branchName
    level: core
    type: keyword
    description: 'Name of the branch for which the test ran'
    example: '"branchName": "master"'
  - name: cleanup
    level: core
    type: keyword
    description: ' Cleanup version'
    example: '"cleanup": "x.y.z"'
  - name: commons
    level: core
    type: keyword
    description: ' Commons version'
    example: '"commons": "x.y.z"'
  - name: dependency.tree
    level: core
    type: text
    description: 'complete dependency tree generated by maven'
    example: '"dependency.tree": "..."'
  - name: dito-sdk-java
    level: core
    type: keyword
    description: ' SDK version'
    example: '"dito-sdk-java": "x.y.z"'
  - name: elapsedTime
    level: core
    type: number
    description: 'elapsed time in nano seconds'
    example: '"elapsedTime": "1234567890"'
  - name: elapsedTimeUnit
    level: core
    type: constant_keyword
    description: 'elapsed time in nano seconds'
    example: '"elapsedTimeUnit": "ns"'
  - name: font-asian
    level: core
    type: keyword
    description: ' Asian fonts version'
    example: '"font-asian": "x.y.z"'
  - name: forms
    level: core
    type: keyword
    description: ' Forms version'
    example: '"forms": "x.y.z"'
  - name: functional-tests
    level: core
    type: keyword
    description: 'functional tests repo version'
    example: '"functional-tests": "x.y.z"' 
  - name: html2pdf
    level: core
    type: keyword
    description: ' html2pdf version'
    example: '"html2pdf": "x.y.z"'
  - name: io
    level: core
    type: keyword
    description: ' io version'
    example: '"io": "x.y.z"'
  - name: jenkinsBuild
    level: core
    type: keyword
    description: 'jenkins build number'
    example: '"jenkinsBuild": "x.y.z"'
  - name: kernel
    level: core
    type: keyword
    description: ' kernel version'
    example: '"kernel": "x.y.z"'
  - name: layout
    level: core
    type: keyword
    description: ' layout version'
    example: '"layout": "x.y.z"'
  - name: licensing-base
    level: core
    type: keyword
    description: ' licensing client version'
    example: '"licensing-base": "x.y.z"'
  - name: licensing-remote
    level: core
    type: keyword
    description: ' licensing server version'
    example: '"licensing-remote": "x.y.z"'
  - name: method
    level: core
    type: text
    multi_fields:
        - name: keyword
          type: keyword
    description: 'method name'
    example: '"method": "doSomeThing"'
  - name: operatingSystem
    level: core
    type: text
    multi_fields:
        - name: keyword
          type: keyword
    description: 'operating system: two flavors of Windows, or Linux'
    example: '"operatingSystem": "Windows 11"'
  - name: 
    level: core
    type: keyword
    description: '  version'
    example: '"": "x.y.z"'
  - name: pdfa
    level: core
    type: keyword
    description: ' pdfa version'
    example: '"pdfa": "x.y.z"'
  - name: pdfocr-api
    level: core
    type: keyword
    description: ' pdfocr client version'
    example: '"pdfocr-api": "x.y.z"'
  - name: pdfocr-tesseract4
    level: core
    type: keyword
    description: ' pdfocr tesseract4 version'
    example: '"pdfocr-tesseract4": "x.y.z"'
  - name: pdfoffice
    level: core
    type: keyword
    description: ' pdfoffice version'
    example: '"pdfoffice": "x.y.z"'
  - name: pdfoptimizer
    level: core
    type: keyword
    description: ' pdfoptimizer version'
    example: '"pdfoptimizer": "x.y.z"'
  - name: pdfrender
    level: core
    type: keyword
    description: ' pdfrender version'
    example: '"pdfrender": "x.y.z"'
  - name: pdfrender-cli
    level: core
    type: keyword
    description: ' pdfrender-cli version'
    example: '"pdfrender-cli": "x.y.z"'
  - name: pdftest
    level: core
    type: keyword
    description: ' pdftest version'
    example: '"pdftest": "x.y.z"'
  - name: pdfxfa
    level: core
    type: keyword
    description: ' pdfxfa version'
    example: '"pdfxfa": "x.y.z"'
  - name: platform
    level: core
    type: keyword
    description: 'Java or .Net'
    example: '"platform": "Java"'
  - name: platformVersion
    level: core
    type: keyword
    description: 'platform version'
    example: '"platformVersion": "x.y.z"'
  - name: scenarioName
    level: core
    type: text
    multi_fields:
        - name: keyword
          type: keyword
    description: 'Jira Scenario title - can be dynamicly built'
    example: '"scenarioName": "Some text with spaces"'
  - name: sign
    level: core
    type: keyword
    description: ' sign version'
    example: '"sign": "x.y.z"'
  - name: startDateTesting
    level: core
    type: date
    description: 'Start date and time of the whole test run.
    
        Because performance tests are run at night it is possible
        that one run spans two days. It would be then hard to 
        bucket runs in a certain moment.
    '
    example: '2016-05-23T08:05:34.853Z'
  - name: svg
    level: core
    type: keyword
    description: ' svg version'
    example: '"svg": "x.y.z"'
  - name: testIssue
    level: core
    type: keyword
    description: 'Jira test ID'
    example: '"testIssue": "QA-xxxx"'
  - name: testType
    level: core
    type: keyword
    description: 'functional or performance'
    example: '"testType": "performance"'
  - name: typography
    level: core
    type: keyword
    description: ' typography version'
    example: '"typography": "x.y.z"'

An index template is absolutely the correct approach / best practice for what you are doing why do you "not want one"... a daily index is exactly what an index pattern is for, will take you about 15 mins to put together, basically your fields.yml gives you everything you need.

If you use an index pattern you do not use the fields.yml