Noob index management - naming / cleanup


(ethr bunny) #1

Ive setup a new beat->logstash->elastic system and have started collecting data from a number of sources (files, metrics, etc). Looking through the indexes available via Kibana Im seeing nearly a hundred entries marked 'index-' (EG "filebeat-2018.08.10") (same for "metricbeat").

  • Is this by design? Do I want / need a new index every day?

Looking through 'filebeat.reference.yml' it appears that these are generated by elasticsearch. I don't send data directly there but rather through logstash... Or do I have this configured incorrectly? I have the elasticsearch entries in 'filebeat.yml' commented out. There's certainly data being sent in so I assume the 'logstash' entries are at least accurate.

  • Lots of spurious entries are appearing in the indexes (EG "#033.31;1mbuilds#033.0;m.keyword")
    --> where are these coming from and how do I eliminate them?

(Andreas H) #2

By default the elasticsearch output filter will put its documents in the logstash-%{+YYYY.MM.dd} Docs Link
This is done on purpose in order to help you clean up old data. When you create an index pattern inside Kibana you can then define your dashboard to use logstash-* pattern so that it picks up all of the indexes.


(ethr bunny) #3

Im confused as Im sending all my data via logstash. Or are you referring to the logstash->elastic output?


(Andreas H) #4

Yes, sorry.
I meant:
output {
elasticsearch {}
}

This will automatically generate the index name logstash-%YYYYMMdd unless you define another index name


#5

That sounds very much like beats is shipping directly to Elasticsearch. What does e.g. your filebeat output config look like?

Looking through 'filebeat.reference.yml' it appears that these are generated by elasticsearch.

Elasticsearch indices are created by beats or Logstash

Do I want / need a new index every day?

Having daily indices, at least for logs, help with house keeping. It is much easier to delete whole indices based on time than it is to delete documents from indices based on time. Usually for logs you set some sort of retention period unless you have unlimited storage capacity.

You could go less granular and have weekly indices as well, especially if your log volume is low.


(ethr bunny) #6

My config:

filebeat.inputs:

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/*.log
  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  exclude_files: ['.gz$']

#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

#==================== Elasticsearch template setting ==========================

setup.template.settings:
index.number_of_shards: 3

#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["logstash.otech:5044"]

I use puppet to push out this same config to all hosts sending logs.


#7

Have you restarted the filebeat service? With reload.enabled: false you will have to do that manually.


(ethr bunny) #8

Yes. I believe the indexes are being generated by logstash though:
(from /etc/logstash/conf.d/beats.conf)('input' and 'filter' sections removed)

output {
  elasticsearch {
    hosts => "10.xx.xx.xx:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
  }
}

(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.