Metricbeat cluster sizing

stephenb · July 11, 2021, 6:25am

Interesting... Your diskio document counts are very high there must be many many disks on each host they are about 3-5X what I would expect.

But other numbers see ok I guess it still seems high for 1000 metricbeat hosts.

So here are some of the things I would do, others may have other opinions.

You can try this is on your 1 node

You already have a lot of segments (underlying filesystem data structures) You are not force merging when you roll over so you are generating segment that are beginning to add up. Los of segments = slow queries, you already have 322 segments when you only had 22 shards.

In the ILM Policy On Rollover set force merge to 1 segment ,

You can see your segments with

GET _cat/segments/metricbeat-*/?v

You can clean this up by running the following command this may take 1 or more hours to run, as there is only 1 merge thread per node.

POST metricbeat-*/_forcemerge/?max_num_segments=1

it is a synchronous command but you can just run another command and check the results.

GET _cat/segments/metricbeat-*/?v

Once the segments are merged there will be only 1 per shard.

But over all... if you are really going to ingest and query 350GB / day or more, I would probably run more than a single node. Here are some suggestions, others may have other suggestions.

350GB / Day is non-trivial but we certainly have many use cases with Multiple TBs per day, its about proper scaling.

I would run perhaps 3 nodes, Each with 28GB Heap 1-2 TB SSD
Index Template : 3 Primary Shards, 1 replica (technically this would be better with 6 node so each shards can be completely parallel, there is some math) (If you do not want replicas you can do that , but if you lose a node you will corrupt your index)
ILM Rollover 150GB or 1 day : This will make 3 x 50GB Shards, the shards should balance out and you will get some parallelism.
Force Merge on Hot Rollover to 1 segment.
Your indexing seems OKish there are some settings that could make that better like

"index": {
  "refresh_interval": "30s",
  "translog": {
    "flush_threshold_size": "2gb"
  }

Other considerations would be how long the retention which you have not mentioned os say you wanted to keep this for 7 Days = 350GB Day + 1 replica = 700 GB / Day * 7 Days = ~5TB Data.

Other consideration is the you have some bottleneck with the IOPs, but it that is direct attached SSD but I am not really familiar with AHCI

Topic		Replies	Views
Metricbeat Storage in ElasticSearch index Beats metricbeat	2	2617	January 28, 2020
Metricbeat on 10 hosts - how to config for low shards Beats metricbeat	10	1222	May 24, 2019
[Metricbeat - elastic module] Can't monitor more than 10 elastic nodes Beats beats-module	4	474	June 17, 2020
No results in Kibana with metricbeat Kibana	17	6599	February 16, 2017
Kibana discover tab is lagging Kibana	18	870	August 27, 2021

Metricbeat cluster sizing

Related topics