Filebeat -> ElasticSearch for logrus running on k8s

Gotcha makes sense. Excitingly I am seeing promising results from trying your exact pipeline, so thank you for that!

I will have to do some analysis of filebeat logs themselves to see where we are failing as I see a few Cannot index event publisher.Event for some of the more esoteric logs. We will then have to decide bewteen going live with filebeat-* patterns and no configured ILM policies or trying to get that all wrapped up first

Re:

part of why I'm asking where to set index configurations is to avoid doing it through the UI - we have a strong preference to use committed files for infrastructure (e.g., terraform), since we frequently spin-up / spin-down different environments, and being able to correctly provision an ES cluster with all settings configured is the "holy grail" for our CI/CD workflow.

Do you think it's fair to try and do all of your referenced ILM policy configuration through the elasticsearch resources at Terraform Registry ?

1 Like

if you used the filebeat settings your filebeat.yml could be this simple..

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/container-mixed/*.log

setup.kibana:

output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: "discuss-mixed-container"

processors:
  - add_tags:
      tags: [monitor-logs]

if you set up all your templates and own ILM etc...

It could be this simple

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/container-mixed/*.log

setup.kibana:

# Must have these if you want to set your own index name!!!
setup.template.enabled: false
setup.ilm.enabled: false

output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: "discuss-mixed-container"
  index: my-applogs
1 Like

Ahhh GOOD!! Yes my most succesful teams do this all through the API and CI/CD! Good to hear

Yes I think so it is really Template, ILM and Initial Managed Index which is basically and empty index with a write alias I think the example is even that

A write alias is the base pattern that filebeat always writes to say
log-app-frontend that is what would be the index setting in filebeat

But it really points to

log-app-frontend-2022.08.23-000001 for instance ...

then it rolls over filebeat keeps writing to log-app-frontend but the backing index is

log-app-frontend-2022.08.24-000002

That is how write alias and rollover work to give you ILM ...

Yes you can just iterate to clean that all up... there is some more advanced patterns .. that you can do in pipeline if there are errors to just revert to the raw log line and pump them to a separate index or some.. but best thing is to just iterate... and get 95%+ of logs and then work from there.

Filebeat comes with and ILM policy... you can just edit to your purpose.. that is the easy part. I would do it through the UI.. the GET the policy afterwards so you can see what a wellformed ILM policy looks like..

Ohh cool that you got it working
Add those App Tags to filter on...
Then you can switch to seperate index name in a bit once you get it all going .. little change for your customer... depending on the volume...

You might find the like it... hey I can see /search the app log and front end log in the same search... oooo they did not think of that! :slight_smile: .... i.e. the equivalent of log-* for every then just turn on / off tag filters.. not have to select different index patterns

Ok stepping away... keep moving! Good Thread!

Hi Stephen,

I'm returning to setting up indices. We prefer to completely define our configuration rather than try to rely on defaults.

My goal is to do whatever the recommended best practice is for managing different datasets with different retention windows, through indices, ILM, templates, aliases, and data streams

Towards that end I'm experimenting with defining index lifecycles and templates in terraform like so:

resource "elasticstack_elasticsearch_index_lifecycle" "hot_warm_delete_10_30" {
  name = "hot_warm_delete_10_30"

  hot {
    min_age = "1h"
    rollover {
      max_age = "1d"
      max_primary_shard_size = "30gb"
    }
  }

  warm {
    min_age = "10d"
    readonly {
      enabled = true
    }
    forcemerge {
      max_num_segments = 1
    }
    shrink {
      number_of_shards = 1
    }
  }

  delete {
    min_age = "30d"
    delete {
      delete_searchable_snapshot = true
    }
  }
}

resource "elasticstack_elasticsearch_index_template" "logs_app" {
  name = "logs_app"

  priority = 2022

  index_patterns = [
    "logs-app-*"
  ]

  template {
    settings = jsonencode({
      "lifecycle.name" = elasticstack_elasticsearch_index_lifecycle.hot_warm_delete_10_30.name
    })
  }
}

with the idea that I will be able to configure additional lifecycles like hot_delete_10 and hot_warm_cold_delete_10_30_90 to manage migration and retention of different data sources

However, I'm uncertain how to connect the data I'm writing from filebeat to these template configurations.

From the filebeat logs it looks like to set an index like

index: "logs-%{kubernetes.container.name}-%{kubernetes.labels.app_kubernetes_io/name}-%{+yyyy.MM.dd}"

(where the kubernetes fields are populated by processors.add_kubernetes_metadata)

I would also have to configure setup.template.name and setup.template.pattern, which are fields defined not per filebeat.inputs but generally for the filebeat configuration

I am hoping to avoid defining datastreams explicitly in terraform - I would like to infer them from kubernetes fields so that new services don't need to modify the elasticstack or filebeat configurations to start logging

Would you be able to describe the right way to do this configuration? It's a little unclear to me how to completely wire sending data from filebeat to different datastreams conditionally on the source/fields of data

Thank you!
Austin

You are asking a lot @Austin_ES_Questions :slight_smile:

This is how my mature customer work so it is a great approach... You will just need to understand the concepts and their relationships. You will do 90% via your automation and very little will actually go in filebeat.yml

So a couple things before we get started.

This is important

  1. You mention data streams a few times are you staying with 7.17.X or are you moving to 8.x. Data Streams are the default in 8.X so if you want to head there / learn that I would do that now...

What version are you going to stick with and do you want indices or datastreams? they are not the same and it seems you are kinda using them interchangebly they are not...

Take a read here and let me know which / what you want to do ... because it is 2 different approaches... the future is datastreams

  1. [quote="Austin_ES_Questions, post:25, topic:312416"]
    I would also have to configure setup.template.name and setup.template.pattern, which are fields defined not per filebeat.inputs but generally for the filebeat configuration
    [/quote]

Since you are going to do all these wonderfull things via automation you will not be setting anything up via filebeat :slight_smile: You filebeat.yml will get much simpler.

no that is not correct... I will explain later... if you look above I spoke about write aliases or data streams...

We are using the helm charts provided by elastic: https://artifacthub.io/packages/helm/elastic/elasticsearch

and the elastic terraform providers:
https://registry.terraform.io/namespaces/elastic

So our elasticsearch cluster is version 8.3.3 but filebeat is 7.17.3

We do not have a preference for indices/aliases/ILM or datastreams - whatever the officially supported solution is for deploying Elasticsearch + Filebeat from Elastic on kubernetes is

Assume you will move to filebeat 8.X when Helm available?

I am not a helms person so I can not help with that.

I would probably pick a datastreams as they are the path forward with timeseries data.

So you would follow these steps on this page and here

The good thing is if you use component template and your logs are mostly the same then just the index template will be different for each... so most everything will be reusable....

Start with one of your data streams like log-app-frontend-sdk and go through the steps... when you have all that then the filebeat stuff should be is easy...

you will disable template setting and just set the data stream name as the index field in the elasticsearch output.

Ohhh maybe I am still missing somethings major here like this is all running in K8s and you magically want to write to different datastreams.... so yes you will need to figure out what your datastream name will be based on the meta data ...

Not sure exactly what this will look like but it will need to match your datastream

logs-%{kubernetes.container.name}-%{kubernetes.labels.app_kubernetes_io/name}

setup.template.enabled: false
setup.ilm.enabled: false

output.elasticsearch:
  hosts: ["localhost:9200"]
  pipeline: "discuss-mixed-container"
  index: `logs-%{kubernetes.container.name}-%{kubernetes.labels.app_kubernetes_io/name}`

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.