7.0 Upgrade Assistant open shards

VamPikmin · March 14, 2019, 10:35pm

Number of open shards exceeds cluster soft limit

warning

There are [9374] open shards in this cluster, but the cluster is limited to [1000] per data node, for [1000] maximum.

Hi everyone,
I'm using only one host and four indexes, ingesting winlogbeat, netflow and syslog data

Out of the three winlogbeat and netflow can get up to 1GB a day

The server is running on a Linux VM under ESXi, what would be a best practise to make it more responsive if I had to chose between allocating more resources to the existing VM or create new VMs and make them replicas knowing that it would still use the same storage?

Thanks

gbrown · March 18, 2019, 10:21pm

Hi!

First off, the documentation link is broken - that's a known issue for Elasticsearch 6.6, and will be fixed in an upcoming release. You can find the correct documentation here.

Are you certain that you're only using four indices? For example, by default, winlogbeat will create a new index each day, with the name pattern metricbeat-<version>-yyyy.MM.dd. You can see all your indices by running GET _cat/indices?v.

Assuming your setup does indeed have 9374 shards on one node, I'm a bit surprised you haven't already had significant problems. Instead of allocating more resources to one node or adding more nodes, I'd suggest you invest some time in

Adjusting your Beats' index templates (and other ingest sources) configuration to create fewer shards,
Deleting some old indices, if possible,
Using the Shrink API to reduce the number of shards you have,
Using Reindexing to collect the data from many small indices into a smaller number of larger indices.

The 1000 shard/data node limit being introduced in Elasticsearch 7.0 is intended as a safety limit, as we typically see increased instability in clusters that have very high shard counts, and it sounds like the amount of data you're ingesting could easily be done with a much lower shard count, which would make more efficient use of the resources you already have.

You can find information about shard sizing and overhead in this blog post as well.

VamPikmin · March 19, 2019, 1:58am

Hi Gordon,
Thanks very much for your advice.
I have noticed the CPU usage is rarely below 50% and large queries in Kibana take a while to process.
I'm not sure about the exact wording, when I said four indexes I meant I have four patterns?
logstash
netflow
cisco-asa
winlogbeat
Like you said each of these do have multiple files (shards?) separated in days, for today it looks like this

>     green  open .monitoring-es-6-2019.03.15       TptkVm32RgmNfkoFy9Hcmw 1 0 4266355 134071   2.3gb   2.3gb
>     yellow open cisco-asa-2019.03.15              9nP54981RGC2A7XrGhcIVg 5 1  157585      0  71.1mb  71.1mb
>     yellow open netflow-2019.03.15                O8Vw1X8HTYW8nVyjtHZmZQ 5 1 7276889      0   1.8gb   1.8gb
>     yellow open winlogbeat-6.6.1-2019.03.15       Cmi0nRHgQsu6OM7-NOMxwA 5 1  299917      0 302.2mb 302.2mb
>     yellow open logstash-2019.03.15               nXTQFOWtQoOE47Vvy61neA 5 1    1157      0 787.3kb 787.3kb
>     green  open .monitoring-logstash-6-2019.03.15 lNpRb_g7Qwa6XA8zv_1mSA 1 0  509288      0  94.8mb  94.8mb
>     green  open .monitoring-kibana-6-2019.03.15   rFBTfa9jRhWAEOXExK_GLQ 1 0    8640      0   2.2mb   2.2mb

When I run
curl -XGET localhost:9200/logstash*?pretty

"settings" : {
"index" : {
"refresh_interval" : "5s",
"number_of_shards" : "5",
"provided_name" : "logstash-2019.03.19",
"creation_date" : "1552954079105",
"number_of_replicas" : "1",
"uuid" : "BT7wVtIESB2U0XgErtDLVA",
"version" : {
"created" : "6060099"

Does that mean I have 5 shards per index?

I have been deleting netflow indices older than 2 months. I have been keeping the rest for now.

I will read the articles you linked, I'm having a bit of trouble grasping it.

Anything older than a month, if I want to keep I should somehow close so it's not using any resources? but then if I need to go back I can open?

Thank you

VamPikmin · March 19, 2019, 2:57am

Do you have any examples for me to test the reindex?

Can I specify a from and to date when I use the reindex command or does it always have to be run on all of the indices (logstash-*)
Will the new index be a single logstash-copy, or will this be a copy of the data per day as before?
Sorry for asking stupid questions

I've worked out the close command finally.. This is after using logstash for over a year. Some of us are slow

I've started with this command now as a test, combine old daily into month:

curl -XPOST localhost:9200/_reindex -d '{"source": {"index": "cisco-asa-2018.06.*"}, "dest": {"index": "cisco-asa-2018.06"} }' -H 'Content-Type: application/json'

VamPikmin · March 25, 2019, 11:50pm

I have reindexed and closed all indices older than a month.

I've worked out how to shrink an index now, how can I go about changing the default of 5 shards and 1 replica on future indices?

From what I gather it's something in the template, since I have one node I want to change replicas to 0 and shards to 1?

Thank you very much

system · April 22, 2019, 11:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Number of Shards open Issue - 7.0.1 Elasticsearch	6	32605	June 11, 2019
Tuning Elasticsearch - massive virtual memory timeouts Elasticsearch	10	569	January 18, 2019
Shard creation running crazy since upgrade Elasticsearch	2	333	February 20, 2020
Number of open shards exceeds cluster soft limit Elasticsearch	5	2527	July 18, 2019
Testing Elastic Stack and winlogbeat / query exceeds 1000 shards Elasticsearch	32	4093	April 8, 2017

7.0 Upgrade Assistant open shards

Related topics