ES 2.3 -> 5.2 Memory Issues

I'm running an ELK stack on a single windows machine with 64g or ram and 2Tb disk space for storage.

I first set this up using ES 2.3 and had a proof of concept running pretty smoothly with a heap size of 30g. I decided to rebuild the stack in ES 5.2 but am unfortunately running into a lot of issues with the heap. Whenever I try to load a dashboard, even over a relatively small time window, the heap fills up, dumps, and crashes elasticsearch. I tried to keep most of the configuration the same between the two installations but there are a couple worthwhile changes to note:

  • I installed X-Pack across the stack with a basic license for status monitoring
  • I pointed path.data at an external data drive on the device. This is a separate disk than ELK and the OS are installed to. I suspect this may be causing some sort of issue but don't know enough to be sure.
  • bootstrap.memory_lock is now set to true although I think this should actually be helping, not hurting

The rest of the configs are their default value for the most part but I would be interested in hearing if there are other suggestions for optimizing a configuration in my "single machine" situation.

Thank you for taking the time to look at my issue.

Brandon

I guess it's also worth noting that I see a lot of garbage collection whenever I make a request:

[gc][183] overhead, spent [705ms] collecting in the last [1s]
[gc][203] overhead, spent [510ms] collecting in the last [1s]
[gc][204] overhead, spent [467ms] collecting in the last [1s]
[gc][211] overhead, spent [695ms] collecting in the last [1s]
[gc][212] overhead, spent [981ms] collecting in the last [1.1s]
[gc][219] overhead, spent [373ms] collecting in the last [1s]
[gc][220] overhead, spent [664ms] collecting in the last [1s]
[gc][221] overhead, spent [720ms] collecting in the last [1s]
[gc][222] overhead, spent [3.4s] collecting in the last [3.5s]
[gc][223] overhead, spent [936ms] collecting in the last [1.1s]
[gc][224] overhead, spent [895ms] collecting in the last [1s]

What's the mapping look like?
What does Monitoring show?

I used the auto generated mapping. In ES 2.3, when you create use an index pattern starting with logstash-*, this automatically exposes the .raw fields for analyzed strings so I did a similar thing when I set up 5.2. From what I can tell .keyword is the new equivalent and I have been using those when building the visualizations for my dashboards. Most of the fields I have been using have been keyword strings with mappings that look like this:

      "app_id" : {
        "type" : "text",
        "norms" : false,
        "fields" : {
          "keyword" : {
            "type" : "keyword"
          }
        }
      },

But let me know if looking at the entire mapping would be helpful.

Here's the mapping section as well:

"mappings" : {
      "_default_" : {
        "_all" : {
          "enabled" : true,
          "norms" : false
        },
        "dynamic_templates" : [
          {
            "message_field" : {
              "path_match" : "message",
              "match_mapping_type" : "string",
              "mapping" : {
                "norms" : false,
                "type" : "text"
              }
            }
          },
          {
            "string_fields" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                },
                "norms" : false,
                "type" : "text"
              }
            }
          }
        ],

Looking at the monitoring data for ES just seems to confirm what I observed. As Kibana makes the requests necessary to generate the dashboard, there are spikes in JVM heap usage and GC count/duration. From the command line of my ES 2.3 instance, I observe the same behavior (GC rate picks up and heap ramps up to its limit) but it never actually crashes.

I'm using the same JRE for both installations so I can't imagine the GC efficiency is different. Could my problem have something to do with the fact that using x-pack monitoring has created several more shards that ES is looking at?

Hope this helps diagnose the problem!

Thanks

How big is the heap? What OS + JVM?
How many indices/shards/docs? What GB size?

Elasticsearch
JVM: JRE 1.8.0_121
JVM Heap: 30g (I have 64g ram)
OS: Windows 10
Storage: 2Tb
Index Patterns: 1
Indices: 18 (new index is generated daily)
Primary Shards: 34
Replica Shards: 0
Documents: 51 million and counting
Note: The first index contains the majority of these documents because I did a massive upload when I first started it up.

Kibana
Memory: 716MB

Logstash
JVM Heap: 16g

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.