Performance Issues and timeouts with Elasticsearch

Hello together,

I'm facing strange performance issues the last days.

Following my development-cluster-setup:

  • Single Node
  • 12GB RAM
  • 4 Cores
  • Spinning Disks
  • a total of 45Million (~40GB of data) Documents spread across 80 indexes and 80 shards
  • Four different daily indexes, with data from 100MB up to 3GB

I allocated 6GB of RAM to JVM Heap.
Because this installation is to get to know with the elk stack, also logstash and kibana are running on that machine.

Now my problem: When opening a Dashboard (Last 3Days) which includes visualisations with cross-index-searches, Elasticsearch is not reachable after a time. Seems like it is hard working, but in top neither CPU, nor RAM or IO-Wait have too high values. The elasticsearch log also doesn't throw any error, but Kibana runs in its timeout (in my case 120s)
Another interesting fact is, that elasticsearch can not be stopped after this freeze and also the query
curl -XGET localhost:9200 returns nothing (it waits for response)

The picture shows the moment when elasticsearch is freezing.

Well, of course this is not much information, but maybe anyone has a hint what I could improve or what I can check to find the error. Maybe the hardware used for this amount of data is too bad?

Today I reindexed all Indexes with more than one primary shard. Now every Index (including the system indexes) has one primary shard.

After that the system was stable.
Of course, when we go productive with a multi-node-cluster, I'll change it back to two primary shards.

A strange fact is, that when I use the Reporting Tool, a new Index with five primary shards is created. Then elasticsearch freezes again. But when I create a second report, which is stored in the same index, then it does not freeze.

I don't know wether it has something to do with the shard-count, or maybe with something like garbage collection.

With 40GB of data, 80 indexes and 80 shards may be way too much compared to 6GB of JVM heap. Try to reindex 40 GB of data into 1 index with 2 to 4 primary shards and see if it helps. The average size of my indexes is 100 GB.

I was a little confused that sometimes it runs like a charm and sometimes these freezes are happening.
I found out (or at least it seems to be the reason) that when I log in with the built-in "elastic" user, everything works and the Dashboards load as fast as expected. But as soon as I log in with my Domain-User (connected via an active-directory realm) it freezes every time.

Here some additional information:

  1. role_mapping.yml:


  • "cn=my_name,ou=my_ou,dc=path_to,dc=my,dc=domain"
  1. The authentication realm-chain looks like this:
          type: native
          order: 0
          type: active_directory
          order: 1
          domain_name: my_domain
          url: ldap://my_ldap_server
          unmapped_groups_as_roles: true
  1. In the elasticsearch.log (with cluster-wide-loglevel:debug) it continously tries to

[2016-11-28T11:10:03,058][DEBUG][o.e.x.s.a.s.NativeRolesStore] [de-elk01] attempting to load role [ROLE] from index

Where ROLE is any group in which my Active-Directory Account belongs to. This event is triggered about 5000 times a second. Of course, not all those groups are configured as roles in x-pack (because some groups are to manage other applications)

Here the solution: The setting unmapped_groups_as_roles: true in the elasticsearch.yml caused the problem. I don't know why this happened, but after setting it to false, everything works like expected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.