Recommended Elasticsearch Node requirements for monitoring 2000 nodes

Bingu_Shim · March 18, 2020, 9:25am

Hello, I'm starting to research Metricbeat + Kibana Inventory to monitor system resources.
The total number machine that we are planning to monitor is about 2,000.

Currently I was able to deploy metricbeat to 79 hosts and explore the metric on Kibana Inventory whithout any problem.

Next step will be expanding it to more nodes, and I want to prepare enough Elasticsearch Node for it.

Currently I've tested with on this environment.

5 Data Nodes (16G/16Core) + 3 Master Node(16G/16Core)

metricbeat.yml

logging.level: info

output.elasticsearch:
  ...
  worker: 2
  bulk_max_size : 1024

queue:
  mem :
    events: 4096
    flush.min_events: 2048
max_procs : 1

setup.ilm:
  enabled : auto

setup.dashboards.enabled: false
setup.template.settings:
  index:
    codec: best_compression
    number_of_shards: 5
    number_of_replicas: 1
    refresh_interval: 10s

setup.kibana:
  ...

#------ metric beat specific configuration

metricbeat.max_start_delay: 10s
metricbeat.modules:
  - module: system
    metricsets:
      - cpu             # CPU usage
      - load            # CPU load averages
      - memory          # Memory usage
      - network         # Network IO
      - uptime          # System Uptime
      - fsstat          # File system summary metrics
      - diskio          # Disk IO
      - process_summary # process 요약


    enabled: true
    period: 10s
    processes: ['.*']

    # Configure the metric types that are included by these metricsets.
    cpu.metrics:  ["percentages", "normalized_percentages"]  # The other available options are normalized_percentages and ticks.
    core.metrics: ["percentages"]  # The other available option is ticks.
processors:
- add_host_metadata:

Kerry · March 19, 2020, 3:56pm

Hi @Bingu_Shim, I've reached out internally regarding your question, and hope to have some information for you soon.

Kerry · March 20, 2020, 10:50am

Hi, after speaking with a colleague, unfortunately it's hard for us to answer this as it's often a case of "it depends".

Our reccomendation would be to experiment with as much (realistic) data as possible, and see if there's a bottleneck. Serving 2,000 nodes in the UI shouldn't be a problem, but your cluster might not be able to handle the write load. If this is the case, you may need to add more shards to optimise for a write heavy environment.

Bingu_Shim · March 21, 2020, 2:56am

Hello @Kerry

Thank you for your response.
With my configurations above, the number of documents written per minutes are as follow.

event.dataset	docs per minutes
system.diskio	6
system.fsstat	6
system.load	6
system.uptime	6
system.cpu	6
system.memory	6
system.process.summary	6
system.network	96
total	138

So, I can get required write performance as follow.

2.3 tps per 1 node
4,600 tps for 2,000 nodes.

This write load should be handled with our elasticsearch cluster. (I've done over 30K write performance test on our environment. The use cases was different though)

What I want to know is Metric UI sides latency.
Since I got UI latency problem with using Elastic APM (THIS ISSUE, THIS ISSUE), and found out the team just started architectural improvement to solve latency problem.

So we just want to make it sure that scalability of Metric UI, before going further.

As you mentioned as follow, there won't be scalability problem with UI side.

Serving 2,000 nodes in the UI shouldn't be a problem

We will trying to apply Metricbeat more machines.

system · April 18, 2020, 2:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
System design for monitoring more than 1250 nodes Elasticsearch	3	585	July 2, 2015
Ask for recommendation to monitor 1000 PC with Metricbeat Elasticsearch	1	338	August 21, 2020
[Metricbeat - elastic module] Can't monitor more than 10 elastic nodes Beats beats-module	4	458	June 17, 2020
Benchmarks (again) Elasticsearch	11	420	July 6, 2017
How to calculate number of nodes of how to monitor when increase the node Elasticsearch	1	381	July 6, 2017

Recommended Elasticsearch Node requirements for monitoring 2000 nodes

Related topics