Is there a memory issue? I cannot get stats directly from ES for that
node, but the OS shows plenty of memory:
$ cat /proc/meminfo
MemTotal: 24604156 kB
MemFree: 5863904 kB
Running ES using the tanuki wrapper with 16gb allocated to the JVM:
-Xmx16384m
It appears that the JVM is using all of its allocated memory without using
the external memory.
top - 11:40:21 up 53 days, 21:40, 1 user, load average: 0.25, 0.26, 0.20
Tasks: 128 total, 1 running, 127 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.0%us, 0.1%sy, 0.0%ni, 95.6%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 24604156k total, 18740160k used, 5863996k free, 165644k buffers
Swap: 17203192k total, 12940k used, 17190252k free, 847548k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4595 elastics 20 0 18.6g 16g 10m S 73.2 70.2 840:12.11 java
Or am I missing something? Is there anything of interest in the jstack
output?
Cheers,
Ivan
On Tue, Oct 2, 2012 at 11:25 AM, Shay Banon kimchy@gmail.com wrote:
I did not understand the misbehavior part, do you mean that shards fail to
relocate to it? Or the fact that it has memory problems?
On Oct 2, 2012, at 1:32 PM, Ivan Brusic ivan@brusic.com wrote:
I should add that the index with the shard issue is not being queried
against and not receiving any updates. It is merely an older version of the
current index.
The overall issue is not why these shards are not reallocating, but why is
this node misbehaving?
On Tue, Oct 2, 2012 at 10:19 AM, Ivan Brusic ivan@brusic.com wrote:
I have a 12 node cluster running 0.19.8 with two-three 100gb indices that
have between six-eight shards and one replica. Not in production, so there
are not many queries. One index gets bulk updated about every 2 hours.
One node in particular (srch-lv105, X30RJ0i-QFOfNrvHT291tw) has been
giving us troubles, accepting connections but not processing them.
Occasionaly dumps large 10GB+ heapdumps.
After the last restart of that node (reallocation still enabled), two
nodes attempt to move shards to it, but they stall part way. There has been
no progress in the past day and the restarted node still contains no active
shards.
The gist provides the cluster stats, node stats, and the jstack of the
three servers involved in the reallocation.
Reallocation failure · GitHub
Cheers,
Ivan
--
--
--