Metricbeat on 10 hosts - how to config for low shards


I'm running metricbeat on around 10 hosts. In ES each host is generating each day a shard or index.
After upgrade to ES7 kibana throws an error like

Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [2577]/[1000] maximum shards open;

Now I want to change the metricbeat configs in a way which is generating not-that-much shards, but I can't find any suitable config examples how to do that.
Could anyone help and write explicitly what I have to put in the metricbeat.yml to achieve that ? :slight_smile:

(Mark Walkom) #2

This is a great use case for implementing ILM. has some of what you want.

(Christian Dahlqvist) #3

Is Metricbeat indexing directly into Elasticsearch or via Logstash? What does your config look like?


I'm indexing directly into ES.
Config is very close to the default, just because it works without problems (at the start). However, here's a sample config from a live host.

Please also note the shards/indices list below the config!

###################### Metricbeat Configuration Example #######################

#==========================  Modules configuration ============================

  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

#==================== Elasticsearch template setting ==========================

  index.number_of_shards: 1
  index.codec: best_compression
  #_source.enabled: false

#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#  env: staging

#============================== Dashboards =====================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here, or by using the `-setup` CLI flag or the `setup` command.
#setup.dashboards.enabled: false

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the
# website.

#============================== Kibana =====================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.

#================================ Outputs =====================================

# Configure what output to use when sending the data collected by the beat.

#-------------------------- Elasticsearch output ------------------------------
  # Array of hosts to connect to.
  hosts: [""]

  # Enabled ilm (beta) to use index lifecycle management instead daily indices.
  #ilm.enabled: false

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

#================================ Processors =====================================

# Configure processors to enhance or manipulate events generated by the beat.

  - add_host_metadata: ~
  - add_cloud_metadata: ~

#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

Here's a snippet of the _cat/indices?v list - every host is creating an index each day, which seems to be not the best solution. However, I dunno what's the best solution here... One big index of all hosts per year? One index per week of all hosts?
The output of each metricbeat host is the same - system metrics. So technically it seems to be useful to put all data in one index, but I couldn't find a simple answer to this simple question :frowning:

yellow open   metricbeat-6.4.0-2019.03.17 6Sc5wLadSHO1CrX7KbRecQ   1   1     441585            0     73.5mb         73.5mb
yellow open   metricbeat-6.6.2-2019.04.02 MnLxvzQPRYuKRVZUUhIS_A   1   1     545701            0    115.8mb        115.8mb
yellow open   metricbeat-6.4.0-2018.11.05 e-kxrEOeSHmlBGsVpKJFog   1   1    6776060            0      1.1gb          1.1gb
yellow open   metricbeat-6.5.1-2018.12.23 SJE2H_kMSq-XsnrMheNFDg   1   1     196861            0     33.5mb         33.5mb
yellow open   metricbeat-6.5.0-2019.01.07 NzQs3oJqQo-Lr16wJDRMzQ   1   1    1022098            0    163.1mb        163.1mb
yellow open   metricbeat-6.4.0-2018.10.29 NOPk3ZLXQLizufAHZOm7yA   1   1    6734894            0        1gb            1gb
yellow open   metricbeat-6.2.4-2018.10.14 vNKq-LGOQp-ETrzn8a2MUg   1   1     609153            0    135.8mb        135.8mb
yellow open   metricbeat-6.2.4-2019.03.06 wP8i1oQbSl2NDts_2mPIMA   1   1     530253            0    123.9mb        123.9mb

In my case, I just want an overview of system metrics of my whole setup. I have a grafana and kibana dashboard made by myself to monitor the values I need, which works perfectly (but the performance).
Until ES6 I used elasticsearch-curator to cleanup metrics older than 180 days, which isn't working anymore in ES7. That's why I try to save the data in an appropriate way.
If possible, I'd like to store data 5 years, however, actually I'm working with 180days so far, but I'd like to expand the time span.
Actually I have 300gb in use. There are ~1,2tb raid5 ssd space usable.

(Mark Walkom) #5

Probably cause you have mixed versions deployed. If you were all on the same version then everything would go into one index. It'd be worth upgrading things :slight_smile:



that's not a helpful advise, I see that problem too, but the output of each host is the very same, regardless the beat version!
If "every day one index for all hosts" is a good setting (I really dunno) please let me know how to surpress the version number in the index, which is - for me - not relevant.

Keep in mind, indexes from some months ago were made with beat eg. 6.4, later on with 6.5.1 an d so on...

Furthermore let me state, that "updating" isn't as easy as you think.
I mean, updating 10 hosts will take it's time:

  • getting lists (apt update)
  • install updates (apt upgrade / apt install metricbeat)
  • restart beat

It might take 20 mins for all hosts, and THEN you WILL have that index-name problem again.
If host 1 is updated now, it will create index name (example) metricbeat-7.0-2019-04-16 but host 10 is still running version 6.7.
I think you got the "problem" here :wink:

So how to manage that?
What is the best practice for storing system metrics over a long period of time?
What configs need the hosts?
What is the best practice for indices/shards for this topic?

(Mark Walkom) #7

The reason we separate by version is that things change, usually when we add extra fields due to extra functionality. You could collapse 6.5.N into one index, you can try doing it for all 6.N, but you may also run into problems.

Automation. And accepting that things take time. It's not a terrible thing :slight_smile:

ILM as mentioned. It'll manage a lot of this.

Otherwise, you can make a change to the beat config to use a monthly index pattern (for eg) and manage it that way.

(Staale) #8

Look into using a config management system like Ansible or Puppet. Makes that update process automatic and done on all hosts almost in parallel.
Blaming time for not keeping the systems updated is not a good thing.