Out of memory on cluster with almost no data


(Wojciech Durczyński) #1

Hello Shay.

I have an out of memory problem in ES 0.17.5.
My cluster contains two data nodes and a lot of short living "test nodes",
that doesn't store any data only connect to cluster to do some operations on
it.
Every "test node" creates its own index with some types and complex mapping
and cleans this index before shutdown.
After a while my cluster is broken - one of the data nodes throws OOM. It
contains then ~100 empty indices.
I analyzed heap dump of broken node and almost whole memory is used by
thread "elasticsearch[Manikin]clusterService#updateTask-pool-11-thread-1",
variable "workQueue" which is of type LinkedBlockingQueue and contains 1798
items of type org.elasticsearch.cluster.service.InternalClusterService$2,
with size ~640kB each
where updateTask.newState.metaData.indices has size ~491kB and
updateTask.newState.routingTable.indicesRouting has size ~102kB.
If this update task contains information about all indices and its mapping
then its size is ok, but why there are so many update tasks in this queue?


(Shay Banon) #2

What exactly do you do on each "test node" that connects? Also, is that a
client node that connects to the cluster?

The thread mentioned is the one responsible for applying cluster wide
changes on the master node. For example, when an index is created, it
handles the data structures associated with it, and shard placements (not
actually moving / placing them, just changing in memory data structures
representing them).

2011/9/26 Wojciech Durczyński wojciech.durczynski@comarch.com

Hello Shay.

I have an out of memory problem in ES 0.17.5.
My cluster contains two data nodes and a lot of short living "test nodes",
that doesn't store any data only connect to cluster to do some operations on
it.
Every "test node" creates its own index with some types and complex mapping
and cleans this index before shutdown.
After a while my cluster is broken - one of the data nodes throws OOM. It
contains then ~100 empty indices.
I analyzed heap dump of broken node and almost whole memory is used by
thread "elasticsearch[Manikin]clusterService#updateTask-pool-11-thread-1",
variable "workQueue" which is of type LinkedBlockingQueue and contains 1798
items of type org.elasticsearch.cluster.service.InternalClusterService$2,
with size ~640kB each
where updateTask.newState.metaData.indices has size ~491kB and
updateTask.newState.routingTable.indicesRouting has size ~102kB.
If this update task contains information about all indices and its mapping
then its size is ok, but why there are so many update tasks in this queue?


(Wojciech Durczyński) #3

"Test nodes" are of course client nodes.
They connect to cluster, create one or two indices with about 10 types. Then
index documents there (5000 at most), executes some queries, deletes data in
created indices and disconnects.
After ~100 similar operations cluster dies with OOM.


(Shay Banon) #4

How do they delete the data? Do they delete the content, or the indices?

2011/9/27 Wojciech Durczyński wojciech.durczynski@comarch.com

"Test nodes" are of course client nodes.
They connect to cluster, create one or two indices with about 10 types.
Then index documents there (5000 at most), executes some queries, deletes
data in created indices and disconnects.
After ~100 similar operations cluster dies with OOM.


(Wojciech Durczyński) #5

They delete content only.


(Shay Banon) #6

can you post the code you use? or a step by step actions you do?

2011/9/27 Wojciech Durczyński wojciech.durczynski@comarch.com

They delete content only.


(system) #7