JVM > 90% - Small indexes , High Shards

Hello,

My team is having some issues with our ElasticSearch cluster regarding the JVM and its need to consume more and more JVM heap. Our issue comes from the need to provide even more RAM for such a simple use case (which can be read below). Would it be wise to scale out to more instance to keep up with the reads? Or is there an issue we have missed?

Use case: We batch load a new index hourly to be read by our front end unit. The index is 20MB in size and will be read 3 to 4 thousand times per second.

Cluster size:
Nodes: 13 Indices: 1293 Shards: 12728 Data: 199.52 GB CPU: 2423% Memory: 308.73 GB / 374.25 GB
3 master nodes - 32GB RAM , 4 Cores
10 slave nodes - 64GB RAM , 8 Cores

      "jvm" : {
    "timestamp" : 1443189668624,
    "uptime_in_millis" : 169701260,
    "mem" : {
      "heap_used_in_bytes" : 30704178392,
      "heap_used_percent" : 89,
      "heap_committed_in_bytes" : 34290008064,
      "heap_max_in_bytes" : 34290008064,
      "non_heap_used_in_bytes" : 126976704,
      "non_heap_committed_in_bytes" : 129069056,
      "pools" : {
        "young" : {
          "used_in_bytes" : 189203944,
          "max_in_bytes" : 558432256,
          "peak_used_in_bytes" : 558432256,
          "peak_max_in_bytes" : 558432256
        },
        "survivor" : {
          "used_in_bytes" : 4907360,
          "max_in_bytes" : 69730304,
          "peak_used_in_bytes" : 69730304,
          "peak_max_in_bytes" : 69730304
        },
        "old" : {
          "used_in_bytes" : 30510067088,
          "max_in_bytes" : 33661845504,
          "peak_used_in_bytes" : 30778888464,
          "peak_max_in_bytes" : 33661845504
        }
      }
    },
    "threads" : {
      "count" : 103,
      "peak_count" : 129
    },
    "gc" : {
      "collectors" : {
        "young" : {
          "collection_count" : 11403,
          "collection_time_in_millis" : 640445
        },
        "old" : {
          "collection_count" : 3068,
          "collection_time_in_millis" : 317938
        }
      }
    },

What sort of data is it, are you using parent/child or nesting, what sort of queries do you run?

Simple data set. Four columns id, date, url, score.

We query based on date, url, and score and have no parent child relationships.

We continue to see high jvm heap and are looking into persistent connection or possible overhead from each read.

front end unit > nginx proxy > elasticsearch

I have done some more digging around to find the problem. I think the high jvm has to do with the amount of persistent connections we have on our system.

I added in the keepalive parameter to nginx. And from the linux OS I see only 450 connections instead of the 4,500 prior. However, I continue to see the "http" - "total_connections" rise.

  • "total_opened" : 7452541
  • "total_opened" : 1770832
  • "total_opened" : 1770971
  • "total_opened" : 1770846
  • "total_opened" : 1735536
  • "total_opened" : 1770768
  • "total_opened" : 1770788

Even went as far as to shut off Nginx for 15 minutes. However the connections continued to rise. Maybe someone can provide some insight as to why?

Opened is a cumulative count rather than a current active one.

I discovered the issue with our setup. Each shard is a lucene index and lucene has overhead when loading into RAM, 20 to 30%.

Since we are creating a new index every hour we are bogging down our memory with lucene index for each shard we created. 10 shards with 3 replicas = 30 new shards all with its own over head.

Just in case anyone runs into this issue in the future.