ELK performance

Hello,

I'm using ELK since a while (8 month), and everything start to being slow. For exemple it take 3 or 4 minutes to display dashboard of the day.

I think, it's because no tuning have been done on the install (it has been done by a tierce person with Bitnami).

Here're the stats of my cluster Elastic :

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "timestamp": 1520937046119,
  "status": "yellow",
  "indices": {
    "count": 386,
    "shards": {
      "total": 1470,
      "primaries": 1470,
      "replication": 0,
      "index": {
        "shards": {
          "min": 1,
          "max": 5,
          "avg": 3.8082901554404147
        },
        "primaries": {
          "min": 1,
          "max": 5,
          "avg": 3.8082901554404147
        },
        "replication": {
          "min": 0,
          "max": 0,
          "avg": 0
        }
      }
    },
    "docs": {
      "count": 56143114,
      "deleted": 861594
    },
    "store": {
      "size": "18.1gb",
      "size_in_bytes": 19435480122,
      "throttle_time": "0s",
      "throttle_time_in_millis": 0
    },
    "fielddata": {
      "memory_size": "1.2mb",
      "memory_size_in_bytes": 1286376,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "0b",
      "memory_size_in_bytes": 0,
      "total_count": 196,
      "hit_count": 0,
      "miss_count": 196,
      "cache_size": 0,
      "cache_count": 0,
      "evictions": 0
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 7485,
      "memory": "94.5mb",
      "memory_in_bytes": 99113518,
      "terms_memory": "75.8mb",
      "terms_memory_in_bytes": 79568407,
      "stored_fields_memory": "7.1mb",
      "stored_fields_memory_in_bytes": 7546504,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "199.1kb",
      "norms_memory_in_bytes": 203968,
      "points_memory": "2.8mb",
      "points_memory_in_bytes": 3016067,
      "doc_values_memory": "8.3mb",
      "doc_values_memory_in_bytes": 8778572,
      "index_writer_memory": "0b",
      "index_writer_memory_in_bytes": 0,
      "version_map_memory": "0b",
      "version_map_memory_in_bytes": 0,
      "fixed_bit_set": "39.2kb",
      "fixed_bit_set_memory_in_bytes": 40168,
      "max_unsafe_auto_id_timestamp": -1,
      "file_sizes": {}
    }
  },
  "nodes": {
    "count": {
      "total": 1,
      "data": 1,
      "coordinating_only": 0,
      "master": 1,
      "ingest": 1
    },
    "versions": [
      "5.4.1"
    ],
    "os": {
      "available_processors": 8,
      "allocated_processors": 8,
      "names": [
        {
          "name": "Windows Server 2012 R2",
          "count": 1
        }
      ],
      "mem": {
        "total": "31.9gb",
        "total_in_bytes": 34359271424,
        "free": "26gb",
        "free_in_bytes": 27950968832,
        "used": "5.9gb",
        "used_in_bytes": 6408302592,
        "free_percent": 81,
        "used_percent": 19
      }
    },
    "process": {
      "cpu": {
        "percent": 18
      },
      "open_file_descriptors": {
        "min": -1,
        "max": -1,
        "avg": 0
      }
    },
    "jvm": {
      "max_uptime": "38.4m",
      "max_uptime_in_millis": 2304563,
      "versions": [
        {
          "version": "1.8.0_131",
          "vm_name": "Java HotSpot(TM) Server VM",
          "vm_version": "25.131-b11",
          "vm_vendor": "Oracle Corporation",
          "count": 1
        }
      ],
      "mem": {
        "heap_used": "949.8mb",
        "heap_used_in_bytes": 995994536,
        "heap_max": "989.8mb",
        "heap_max_in_bytes": 1037959168
      },
      "threads": 81
    },
    "fs": {
      "total": "59.6gb",
      "total_in_bytes": 64055406592,
      "free": "10.6gb",
      "free_in_bytes": 11409387520,
      "available": "10.6gb",
      "available_in_bytes": 11409387520
    },
    "plugins": [],
    "network_types": {
      "transport_types": {
        "netty4": 1
      },
      "http_types": {
        "netty4": 1
      }
    }
  }
}

You have far too many shards given the size of the cluster (single node with only 1GB heap) and volume of data. Please read this blog post and then alter your sharding strategy and try to reduce the shard count significantly.

Hi @Slop,

When reading logs, your cluster seems "fine", no overcache, no out of memory (26G Free! ) FS got 10G remaining space ( maybe not enought ? ).

Maybe, with the time, your data range is bigger than the begining ( more docs in the time ) then your search take more time to be fully performed.

You can monitor your CPU usage when quering your dashboard's day, to see if any anomaly comes

Your cluster is Yellow, don't know why, but it could be the issue. check it out

Let me add some other resources about sizing which might help:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

The first thing you MUST do is to give Elasticsearch more JVM Heap Space to work with. You can do this by editing the jvm.options file for Elasticsearch, this is usually found at /etc/elasticsearch/jvm.options

In this file the first section allows you to set the initial and max JVM Heap Size. It will probably look like this on your node...

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms1g
-Xmx1g

Since you have plenty of RAM available, modify these two settings for 8GB (could be 12 or even 16GB, but start at 8).

-Xms8g
-Xmx8g

Restart Elasticsearch and you will probably be fine.

It is still recommended that you watch heap utilization and ensure that you don't have too many indices and shards. You really don't have very much data, but it may be that it could be better organized. For example, if you are currently writing daily indices, you may want to change to monthly.

Thanks everybody for your answers, I was away for a while.
I'll try to reduce the number of shard and upgrade JVM heap size.
I've reseted everything and will try with a new config.

Kind regards,
Nicolas

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.