There are [9374] open shards in this cluster, but the cluster is limited to [1000] per data node, for [1000] maximum.
Hi everyone,
I'm using only one host and four indexes, ingesting winlogbeat, netflow and syslog data
Out of the three winlogbeat and netflow can get up to 1GB a day
The server is running on a Linux VM under ESXi, what would be a best practise to make it more responsive if I had to chose between allocating more resources to the existing VM or create new VMs and make them replicas knowing that it would still use the same storage?
First off, the documentation link is broken - that's a known issue for Elasticsearch 6.6, and will be fixed in an upcoming release. You can find the correct documentation here.
Are you certain that you're only using four indices? For example, by default, winlogbeat will create a new index each day, with the name pattern metricbeat-<version>-yyyy.MM.dd. You can see all your indices by running GET _cat/indices?v.
Assuming your setup does indeed have 9374 shards on one node, I'm a bit surprised you haven't already had significant problems. Instead of allocating more resources to one node or adding more nodes, I'd suggest you invest some time in
Adjusting your Beats' index templates (and other ingest sources) configuration to create fewer shards,
Deleting some old indices, if possible,
Using the Shrink API to reduce the number of shards you have,
Using Reindexing to collect the data from many small indices into a smaller number of larger indices.
The 1000 shard/data node limit being introduced in Elasticsearch 7.0 is intended as a safety limit, as we typically see increased instability in clusters that have very high shard counts, and it sounds like the amount of data you're ingesting could easily be done with a much lower shard count, which would make more efficient use of the resources you already have.
You can find information about shard sizing and overhead in this blog post as well.
Hi Gordon,
Thanks very much for your advice.
I have noticed the CPU usage is rarely below 50% and large queries in Kibana take a while to process.
I'm not sure about the exact wording, when I said four indexes I meant I have four patterns?
logstash
netflow
cisco-asa
winlogbeat
Like you said each of these do have multiple files (shards?) separated in days, for today it looks like this
> green open .monitoring-es-6-2019.03.15 TptkVm32RgmNfkoFy9Hcmw 1 0 4266355 134071 2.3gb 2.3gb
> yellow open cisco-asa-2019.03.15 9nP54981RGC2A7XrGhcIVg 5 1 157585 0 71.1mb 71.1mb
> yellow open netflow-2019.03.15 O8Vw1X8HTYW8nVyjtHZmZQ 5 1 7276889 0 1.8gb 1.8gb
> yellow open winlogbeat-6.6.1-2019.03.15 Cmi0nRHgQsu6OM7-NOMxwA 5 1 299917 0 302.2mb 302.2mb
> yellow open logstash-2019.03.15 nXTQFOWtQoOE47Vvy61neA 5 1 1157 0 787.3kb 787.3kb
> green open .monitoring-logstash-6-2019.03.15 lNpRb_g7Qwa6XA8zv_1mSA 1 0 509288 0 94.8mb 94.8mb
> green open .monitoring-kibana-6-2019.03.15 rFBTfa9jRhWAEOXExK_GLQ 1 0 8640 0 2.2mb 2.2mb
When I run
curl -XGET localhost:9200/logstash*?pretty
Do you have any examples for me to test the reindex?
Can I specify a from and to date when I use the reindex command or does it always have to be run on all of the indices (logstash-*)
Will the new index be a single logstash-copy, or will this be a copy of the data per day as before?
Sorry for asking stupid questions
I've worked out the close command finally.. This is after using logstash for over a year. Some of us are slow
I've started with this command now as a test, combine old daily into month:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.