Elasticsearch 1.5.2 deployment issue

I have ES 1.5.2 cluster with the following specs:

  • 3 nodes with RAM: 32GB, CPU cores: 8 each
  • 282 total indices
  • 2,564 total shards
  • 799,505,935 total docs
  • 767.84GB total data
  • ES_HEAP_SIZE=16g

The problem is when I am using Kibana to query some thing (very simple queries), if it a single query it`s working fine, but if I continue to query some more - elastic is getting so slow and eventually stuck because the JVM heap usage (from Marvel) is getting to 87-95%. It happens also when I trying to load some Kibana dashboard and the only solution for this situation is to restart the service on all the nodes.

(This is also happens on ES 2.2.0 with Kibana 4)

What is wrong, what am I missing?
Am I suppose to query less?

EDIT:

I had to mention that I have a lot of empty indices (0 documents) but the shards are counted. This is this way because I set ttl on the documents for 4w, and the empty indices will be deleted with curator.

Also we have not disabled doc_values in 1.5.2 nor 2.2.0 clusters.
The accurate specs are as following (1.5.2):

  • 3 nodes with RAM: 32GB, CPU cores: 8 each
  • 282 total indices = 227 empty + 31 marvel + 1 kibana + 23 data
  • 2,564 total shards = (1135 empty + 31 marvel + 1 kibana + 115 data)* 1 replica
  • 799,505,935 total docs
  • 767.84GB total data
  • ES_HEAP_SIZE=16g

curl _cat/fielddata?v result:

2.2.0:

 total os.cpu.usage primaries.indexing.index_total total.fielddata.memory_size_in_bytes jvm.mem.heap_used_percent jvm.gc.collectors.young.collection_time_in_millis primaries.docs.count device.imei fs.total.available_in_bytes os.load_average.1m index.raw @timestamp node.ip_port.raw fs.total.disk_io_op node.name jvm.mem.heap_used_in_bytes jvm.gc.collectors.old.collection_time_in_millis total.merges.total_size_in_bytes jvm.gc.collectors.young.collection_count jvm.gc.collectors.old.collection_count total.search.query_total 
 2.1gb        1.2mb                          3.5mb                                3.4mb                     1.1mb                                                0b                3.5mb       2.1gb                       1.9mb              1.8mb     3.6mb      3.6mb            1.7mb               1.9mb     1.7mb                      1.6mb                                           1.5mb                            3.5mb                                    1.5mb                                  1.5mb                    3.2mb 
 1.9gb        1.2mb                          3.4mb                                3.3mb                     1.1mb                                             1.5mb                3.5mb       1.9gb                       1.9mb              1.8mb     3.5mb      3.6mb            1.7mb               1.9mb     1.7mb                      1.5mb                                           1.5mb                            3.4mb                                       0b                                  1.5mb                    3.2mb 
   2gb           0b                             0b                                   0b                        0b                                                0b                   0b         2gb                          0b                 0b        0b         0b               0b                  0b        0b                         0b                                              0b                               0b                                       0b                                     0b                       0b 

1.5.2:

  total index_stats.index node.id node_stats.node_id buildNum endTime location.timestamp userActivity.time startTime   time shard.state shard.node indoorOutdoor.time shard.index dataThroughput.downloadSpeed 
176.2mb                0b      0b                 0b     232b 213.5kb            518.8kb           479.7kb    45.5mb 80.1mb       1.4kb       920b            348.7kb       2.5kb                       49.1mb 

curl /_nodes/stats result gist

That's really too many shards per node IMO.

Imagine a shard as if it was a database. Would you start around 1000 databases instances on a single node?

So either decrease the number of shards, or increase the number of nodes.

My 2 cents

please see my EDIT

  1. Get rid of the empty indices! They are a massive waste.
  2. Why bother with TTL if you are curating the indices? It too is a waste of resources.
1 Like