Why does heap usage keep approaching 100%?


(Brian) #1

I'm new to ES and I have an ES 1.5.2 cluster with:

  • 4 nodes.
  • Each node currently is an amazon ec2 (c4.2xlarge) instance.
  • 16 GB RAM
  • 8 cores.
  • 8 GB RAM assigned to HEAP
  • 7.9 billion documents
  • 2656 shards
  • 6.57 TB of data
  • all indices are separate into corresponding days
  • Using logstash to bulk insert data
  • Very heavy WRITE and less read. Mostly Analytics/Clickstream/Log data
  • I've used doc_types wherever I could, though possible I've missed some

Problem
Heap Usage is very close to 100%, seems to get to 100% and then drop a little bit and then climbs back to 100%. I suspect garbage collection is constantly running. Leave me little to know heap to use for anything else. Even if I enlarge the heap to say 15GB it still does the same thing and will grow towards 15GB and then cleanup a bit and back to 15GB.

The filter cache only has about 500mb in it.

What is using so much of the heap? Does it simply take that much because of the amount of documents? I feel like I'm missing some understanding here. Let me know what statistics you need and I'll post them. Thanks so much.


(Camilo Sierra) #2

you use doc_values ? this can help you to reduce the heap usage!


(Brian) #3

Yes, I'm using doc_values in all the non_analyzed places I can. Is it possible it's just too many shards?

I have some indexes that can go up to about 100GB a day while others are just 1GB a day. The 100GB ones I have going to 8 shards and there are 30 of them. Should it just be 1 or 2 shards? Is 100GB fine for 1 shard?


(Camilo Sierra) #4

you can also disable field data loading, data can take a lot of ram.

https://www.elastic.co/guide/en/elasticsearch/reference/1.5/fielddata-formats.html#_disabling_field_data_loading


(Camilo Sierra) #5

@warkolm in this post already speak about the number of shards and heap usage - Large heap usage with each node

That's the problem then. Each shard is a lucene instance, it requires resources to maintain.

Reduce that to a reasonable number and you should see better resources usage.


(system) #6