Best practices for fine tuning performance when it comes to Elasticsearch, Logstash and Kibana?

Hello

We have a Elastic Stack setup. Our basic architecture is based on this:

Beats/source/etc -> Logstash -> Elasticsearch.

Meaning, all incoming data always passes thru Logstash first.

We have created various Logstash configuration files for each data incoming. Example: 01-winlogbeat-mssqllogin , 02-winlogbeat-rdpwinsecurity, 03-winlogbeat-winsecurity,04- winlogbeat-activediretorylogin, 05-syslog-firewalllogin etc.

In those configuration files I establish the input-filter-output stages.

One of the "issues" Im facing right now, thinking about the future is, what happens if I setup a LAMP server ? Winlogbeat has only ONE configuration where i can point to ONE Logstash pipeline.

Im having that issue on SQL Servers: They all point to 01-winlogbeat-mssqllogin but I have to make a if in the output section if I want to also monitor Windows logons to write to another index.

With Logstash configuration files, each one needs its only dedicated pipeline as various inputs cannot be on the same pipeline (it errored out for me). Is there any tuning that can be done? Ive read about persistant queues but I do see there are performance penalties so I am not sure if I should implement it.

On the Elasticsearch side of things, with more and more data incoming, I do notice that the speed of searching is slowly slower and slower. We increased the VM to I believe 32GB and Java heap site to 16GB (taking in account that it shouldnt go over the double of the avaliable RAM ) but it hasnt helped much.

Logstash has the Java heap size set to 8GB I believe.

On the Kibana side of things, performance is OK.

Any tips or settings that are recommended?

Thanks and sorry for the long text.

What is the output from the _cluster/stats?pretty&human API?

Here you go:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "M-LVQtdNQ-mbx7vzflTw-w",
  "timestamp" : 1622017082669,
  "status" : "yellow",
  "indices" : {
    "count" : 2038,
    "shards" : {
      "total" : 2038,
      "primaries" : 2038,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 106083445,
      "deleted" : 57953
    },
    "store" : {
      "size" : "77.2gb",
      "size_in_bytes" : 82964153542,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "110.9kb",
      "memory_size_in_bytes" : 113624,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "76.7kb",
      "memory_size_in_bytes" : 78624,
      "total_count" : 28456805,
      "hit_count" : 21602842,
      "miss_count" : 6853963,
      "cache_size" : 83,
      "cache_count" : 117958,
      "evictions" : 117875
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 12934,
      "memory" : "424.2mb",
      "memory_in_bytes" : 444820514,
      "terms_memory" : "351.9mb",
      "terms_memory_in_bytes" : 368997576,
      "stored_fields_memory" : "6.2mb",
      "stored_fields_memory_in_bytes" : 6532496,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "51.9mb",
      "norms_memory_in_bytes" : 54440128,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "14.1mb",
      "doc_values_memory_in_bytes" : 14850314,
      "index_writer_memory" : "777.8mb",
      "index_writer_memory_in_bytes" : 815587072,
      "version_map_memory" : "1.1mb",
      "version_map_memory_in_bytes" : 1193910,
      "fixed_bit_set" : "8.3kb",
      "fixed_bit_set_memory_in_bytes" : 8536,
      "max_unsafe_auto_id_timestamp" : 1622010629251,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 101,
          "index_count" : 23
        },
        {
          "name" : "byte",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "constant_keyword",
          "count" : 3,
          "index_count" : 1
        },
        {
          "name" : "date",
          "count" : 4422,
          "index_count" : 2019
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 4,
          "index_count" : 2
        },
        {
          "name" : "double_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "flattened",
          "count" : 9,
          "index_count" : 1
        },
        {
          "name" : "float",
          "count" : 92,
          "index_count" : 37
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 325,
          "index_count" : 325
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 585,
          "index_count" : 293
        },
        {
          "name" : "integer",
          "count" : 30,
          "index_count" : 4
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 81,
          "index_count" : 81
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 124022,
          "index_count" : 2018
        },
        {
          "name" : "long",
          "count" : 11614,
          "index_count" : 2009
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 11,
          "index_count" : 6
        },
        {
          "name" : "object",
          "count" : 25284,
          "index_count" : 2021
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "text",
          "count" : 123718,
          "index_count" : 2012
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "coordinating_only" : 0,
      "data" : 1,
      "data_cold" : 1,
      "data_content" : 1,
      "data_hot" : 1,
      "data_warm" : 1,
      "ingest" : 1,
      "master" : 1,
      "ml" : 1,
      "remote_cluster_client" : 1,
      "transform" : 1,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.1"
    ],
    "os" : {
      "available_processors" : 4,
      "allocated_processors" : 4,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 1
        }
      ],
      "mem" : {
        "total" : "31.2gb",
        "total_in_bytes" : 33530023936,
        "free" : "235.8mb",
        "free_in_bytes" : 247316480,
        "used" : "30.9gb",
        "used_in_bytes" : 33282707456,
        "free_percent" : 1,
        "used_percent" : 99
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 2
      },
      "open_file_descriptors" : {
        "min" : 11709,
        "max" : 11709,
        "avg" : 11709
      }
    },
    "jvm" : {
      "max_uptime" : "13.7d",
      "max_uptime_in_millis" : 1188136911,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used" : "10gb",
        "heap_used_in_bytes" : 10837510144,
        "heap_max" : "16gb",
        "heap_max_in_bytes" : 17179869184
      },
      "threads" : 184
    },
    "fs" : {
      "total" : "199.9gb",
      "total_in_bytes" : 214691745792,
      "free" : "119.5gb",
      "free_in_bytes" : 128410861568,
      "available" : "119.5gb",
      "available_in_bytes" : 128410861568
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 1
      },
      "http_types" : {
        "security4" : 1
      }
    },
    "discovery_types" : {
      "single-node" : 1
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 1
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 16,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "convert" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "user_agent" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

Why is the status yellow? I can answer that: I believe by default the replicas are set to to two so since there is only one node, it cant replicate it to another node.

It's yellow because you only have one node, so cannot allocate the replicas.

However you have way too many shards. You should be at 1/4 of what you have now.

Well, why do I need to fix that? Just to get a better understanding.

And afterwards, how do I fix it?

Thanks

Fixing the state to be green by allocating replicas is important as you have better data availability.
Minimising shard count means maximising resource use.

But, at least as far as I know, I need two nodes for replicas?

I can change the replicas to be 0 (or I THINK I can; where do I change that?) so the state is green but would that improve performance?

A single node green state has no performance improvements, and puts you at risk of data loss.
Adding a second node is extremely useful.

I agree that a second node is useful but client one wants one although we mention this is a clusters and like the proper word says, they tend to be multinode.

Ive changed to to only one replica and honestly, I do feel like Kibana searching faster.

I ran the command again and this is what I have now:

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "M-LVQtdNQ-mbx7vzflTw-w",
  "timestamp" : 1622729775647,
  "status" : "yellow",
  "indices" : {
    "count" : 2258,
    "shards" : {
      "total" : 2258,
      "primaries" : 2258,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "primaries" : {
          "min" : 1,
          "max" : 1,
          "avg" : 1.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 115950967,
      "deleted" : 26816
    },
    "store" : {
      "size" : "84.4gb",
      "size_in_bytes" : 90710075265,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "43.1kb",
      "memory_size_in_bytes" : 44208,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "57.5kb",
      "memory_size_in_bytes" : 58968,
      "total_count" : 4922256,
      "hit_count" : 3732319,
      "miss_count" : 1189937,
      "cache_size" : 82,
      "cache_count" : 20459,
      "evictions" : 20377
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 14443,
      "memory" : "472.1mb",
      "memory_in_bytes" : 495083158,
      "terms_memory" : "391.6mb",
      "terms_memory_in_bytes" : 410677488,
      "stored_fields_memory" : "6.9mb",
      "stored_fields_memory_in_bytes" : 7290872,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "57.8mb",
      "norms_memory_in_bytes" : 60613760,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "15.7mb",
      "doc_values_memory_in_bytes" : 16501038,
      "index_writer_memory" : "966.5mb",
      "index_writer_memory_in_bytes" : 1013513944,
      "version_map_memory" : "4.1mb",
      "version_map_memory_in_bytes" : 4307343,
      "fixed_bit_set" : "4.3kb",
      "fixed_bit_set_memory_in_bytes" : 4440,
      "max_unsafe_auto_id_timestamp" : 1622706362999,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "binary",
          "count" : 15,
          "index_count" : 4
        },
        {
          "name" : "boolean",
          "count" : 101,
          "index_count" : 23
        },
        {
          "name" : "byte",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "constant_keyword",
          "count" : 3,
          "index_count" : 1
        },
        {
          "name" : "date",
          "count" : 4892,
          "index_count" : 2239
        },
        {
          "name" : "date_nanos",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "double",
          "count" : 4,
          "index_count" : 2
        },
        {
          "name" : "double_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "flattened",
          "count" : 9,
          "index_count" : 1
        },
        {
          "name" : "float",
          "count" : 92,
          "index_count" : 37
        },
        {
          "name" : "float_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "geo_point",
          "count" : 356,
          "index_count" : 356
        },
        {
          "name" : "geo_shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "half_float",
          "count" : 647,
          "index_count" : 324
        },
        {
          "name" : "integer",
          "count" : 30,
          "index_count" : 4
        },
        {
          "name" : "integer_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "ip",
          "count" : 89,
          "index_count" : 89
        },
        {
          "name" : "ip_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 137638,
          "index_count" : 2238
        },
        {
          "name" : "long",
          "count" : 12943,
          "index_count" : 2229
        },
        {
          "name" : "long_range",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "nested",
          "count" : 11,
          "index_count" : 6
        },
        {
          "name" : "object",
          "count" : 27973,
          "index_count" : 2233
        },
        {
          "name" : "shape",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "short",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "text",
          "count" : 137334,
          "index_count" : 2232
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [ ],
      "filter_types" : [
        {
          "name" : "pattern_capture",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [
        {
          "name" : "uax_url_email",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_filters" : [
        {
          "name" : "lowercase",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "unique",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_analyzers" : [ ]
    }
  },
  "nodes" : {
    "count" : {
      "total" : 1,
      "coordinating_only" : 0,
      "data" : 1,
      "data_cold" : 1,
      "data_content" : 1,
      "data_hot" : 1,
      "data_warm" : 1,
      "ingest" : 1,
      "master" : 1,
      "ml" : 1,
      "remote_cluster_client" : 1,
      "transform" : 1,
      "voting_only" : 0
    },
    "versions" : [
      "7.10.1"
    ],
    "os" : {
      "available_processors" : 4,
      "allocated_processors" : 4,
      "names" : [
        {
          "name" : "Linux",
          "count" : 1
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 1
        }
      ],
      "mem" : {
        "total" : "31.2gb",
        "total_in_bytes" : 33530023936,
        "free" : "396.1mb",
        "free_in_bytes" : 415391744,
        "used" : "30.8gb",
        "used_in_bytes" : 33114632192,
        "free_percent" : 1,
        "used_percent" : 99
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 2
      },
      "open_file_descriptors" : {
        "min" : 12854,
        "max" : 12854,
        "avg" : 12854
      }
    },
    "jvm" : {
      "max_uptime" : "6.3d",
      "max_uptime_in_millis" : 546049293,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 1
        }
      ],
      "mem" : {
        "heap_used" : "10.2gb",
        "heap_used_in_bytes" : 11020796928,
        "heap_max" : "16gb",
        "heap_max_in_bytes" : 17179869184
      },
      "threads" : 183
    },
    "fs" : {
      "total" : "199.9gb",
      "total_in_bytes" : 214691745792,
      "free" : "112.2gb",
      "free_in_bytes" : 120488898560,
      "available" : "112.2gb",
      "available_in_bytes" : 120488898560
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 1
      },
      "http_types" : {
        "security4" : 1
      }
    },
    "discovery_types" : {
      "single-node" : 1
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 1
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 16,
      "processor_stats" : {
        "conditional" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "convert" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "geoip" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "grok" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "remove" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "user_agent" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

One of the things I am seeing is that my memory usage is constantly over 90% ; Is there any way I can keep that down?

Thank you

You've got way too many shards for a single node. You should be aiming for <600.

How do I lower the shard count?

Use the _shrink API for older indices and implement ILM for new ones.

Does

POST /*/_shrink/*

Work?

I also imagine that I need to set index.number_of_shards or number_of_shards in the index template; What number do I set it to?