Lack of memory?

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection occurred,
took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout
waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values
(like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

In general, this should not happen, as memory should be cleaned correctly.
This might indicate a memory leak (btw, which version are you using)? It
might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

cheers,
shay.banon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.com wrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection occurred,
took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout
waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values
(like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned correctly.
This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block" (I've
tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout
waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger values
(like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

You can use
jmap - Memory Map. First, if
you can send me the histo output, maybe that would be enough... (jmap
-histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned correctly.

This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory
dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3] Timeout
waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use
jmap - Memory Map. First,
if you can send me the histo output, maybe that would be enough... (jmap
-histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned correctly.

This might indicate a memory leak (btw, which version are you using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory
dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a full
heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html. First,
if you can send me the histo output, maybe that would be enough... (jmap
-histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory
dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2 non-data
nodes and 3 data nodes. Then I've started pushing to cluster few GB of data.
While doing this I've seen on htop that ES processes are using more and more
memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

Also, can you explain a bit about what you performed against the nodes? Did
you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a full
heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

jmap - Memory Map.
First, if you can send me the histo output, maybe that would be enough...
(jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs to
allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory
dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2
non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB
of data. While doing this I've seen on htop that ES processes are using more
and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

So, I tried to analyze as best as I can, and I might found a problem that
can explain your case. It basically revolves around cleaning a search
context that relates to a search request eagerly (it automatically gets
cleaned after 5 minutes of inactivity by default). So, if your system
executed a lot of search requests within that 5 minutes period, it might
explain the problem. The issue is:
http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Also, can you explain a bit about what you performed against the nodes? Did
you index large documents? Performed a lot of searches or mainly indexing?

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a full
heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html.
First, if you can send me the histo output, maybe that would be enough...
(jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs
to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for a
memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM memory
dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź sz.gwozdz@gmail.comwrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2
non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB
of data. While doing this I've seen on htop that ES processes are using more
and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ] ggmail-test-3-55691:
dropped message from ggmail-test-5-49710 (not in xmit_table), keys are
[ggmail-test-2-44830, ggmail-test-3-55691, ggmail-test-2-29869,
ggmail-test-4-9592, ggmail-test-1-19291], view=[ggmail-test-1-19291|9]
[ggmail-test-1-19291, ggmail-test-4-9592, ggmail-test-2-29869,
ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type). Curl
returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

Hi!

Thanks for looking at this problem. I could make a full dump, but not
earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) ) Then I repeat this operation for next index, and next, and next, ...]
Two processes described in are launched. After 20-60 indexes (depends on
memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

So, I tried to analyze as best as I can, and I might found a problem that
can explain your case. It basically revolves around cleaning a search
context that relates to a search request eagerly (it automatically gets
cleaned after 5 minutes of inactivity by default). So, if your system
executed a lot of search requests within that 5 minutes period, it might
explain the problem. The issue is:
http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Also, can you explain a bit about what you performed against the nodes?
Did you index large documents? Performed a lot of searches or mainly
indexing?

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a full
heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

jmap - Memory Map.
First, if you can send me the histo output, maybe that would be enough...
(jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs
to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will "block"
(I've tested it)

How familiar are you with java? Is there a chance that I can ask for
a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM
memory dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <sz.gwozdz@gmail.com

wrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2
non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB
of data. While doing this I've seen on htop that ES processes are using more
and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type).
Curl returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

I see. Is there a chance that you can send the program you use and I will
run it?

cheers,
shay.banon

2010/4/30 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Thanks for looking at this problem. I could make a full dump, but not
earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) ) Then I repeat this operation for next index, and next, and next, ...]
Two processes described in are launched. After 20-60 indexes (depends
on memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

So, I tried to analyze as best as I can, and I might found a problem that

can explain your case. It basically revolves around cleaning a search
context that relates to a search request eagerly (it automatically gets
cleaned after 5 minutes of inactivity by default). So, if your system
executed a lot of search requests within that 5 minutes period, it might
explain the problem. The issue is:
http://github.com/elasticsearch/elasticsearch/issues/issue/153.

cheers,
shay.banon

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Also, can you explain a bit about what you performed against the nodes?
Did you index large documents? Performed a lot of searches or mainly
indexing?

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a
full heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html.
First, if you can send me the histo output, maybe that would be enough...
(jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply needs
to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will
"block" (I've tested it)

How familiar are you with java? Is there a chance that I can ask for
a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM
memory dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <
sz.gwozdz@gmail.com> wrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2
non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB
of data. While doing this I've seen on htop that ES processes are using more
and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected by
ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type).
Curl returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB. Bigger
values (like 3GB) changes only time after which this situations happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?

By the way, another possible reason for this is that simply creating so many
indices (each with 5 shards and replica each) is too much on only 3 nodes.
Each shard instance is a Lucene index, and it does takes its tall on memory
usage. To verify this, you can try and change your test to use types instead
of different indices, and use a single index, see if you get problems then.

cheers,
shay.banon

2010/4/30 Shay Banon shay.banon@elasticsearch.com

I see. Is there a chance that you can send the program you use and I will
run it?

cheers,
shay.banon

2010/4/30 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Thanks for looking at this problem. I could make a full dump, but not
earlier than Tuesday unfortunately.
This is what I'm doing to get this problem:

[I'm putting a lot (few thounsands) of random messages to one index and one type (single message consist of few values and one text value, which has 500B-100K size (in order to highlighting, store and term_with_positions_offset on this text value is on - defined in mapping of this (index,type) ) Then I repeat this operation for next index, and next, and next, ...]
Two processes described in are launched. After 20-60 indexes (depends
on memory) this problem is existing.

No search operation is launched, so it can't be issue that you suggested.

chears,
Szymon Gwóźdź

W dniu 29 kwietnia 2010 23:51 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

So, I tried to analyze as best as I can, and I might found a problem

that can explain your case. It basically revolves around cleaning a search
context that relates to a search request eagerly (it automatically gets
cleaned after 5 minutes of inactivity by default). So, if your system
executed a lot of search requests within that 5 minutes period, it might
explain the problem. The issue is:
Time Memory Leak: Search requests don't eagerly clean the search context · Issue #153 · elastic/elasticsearch · GitHub.

cheers,
shay.banon

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Also, can you explain a bit about what you performed against the nodes?
Did you index large documents? Performed a lot of searches or mainly
indexing?

2010/4/29 Shay Banon shay.banon@elasticsearch.com

Hi,

Thanks!, this certainly helps. Is there a chance that you can do a
full heap dump (it will be a large file) and place it somewhere that I can
download it? I appreciate the effort at helping nailing this out...

cheers,
shay.banon

2010/4/29 Szymon Gwóźdź sz.gwozdz@gmail.com

Hi!

Histo output from one of the data-nodes is in attachment

cheers

W dniu 28 kwietnia 2010 20:27 użytkownik Shay Banon <
shay.banon@elasticsearch.com> napisał:

You can use

jmap - Memory Map.
First, if you can send me the histo output, maybe that would be enough...
(jmap -histo:live pid).

thanks!
shay.banon

2010/4/28 Szymon Gwóźdź sz.gwozdz@gmail.com

2010/4/28 Shay Banon shay.banon@elasticsearch.com

In general, this should not happen, as memory should be cleaned

correctly. This might indicate a memory leak (btw, which version are you
using)?

I'm using version 0.6.0

It might just be that you are overloading the nodes and simply
needs to allocate more memory per JVM (as you suggested, raising it to 3G).

Raising it to 3 GB will only delay the moment in which ES will
"block" (I've tested it)

How familiar are you with java? Is there a chance that I can ask
for a memory dump of the JVM so I can analyze it?

I'm not very familiar with Java, but maybe I will try to get JVM
memory dump

cheers,
shay.banon

cheers,
Szymon

On Wed, Apr 28, 2010 at 2:46 PM, Szymon Gwóźdź <
sz.gwozdz@gmail.com> wrote:

Hi!

I've lauched 5 ElasticSeach nodes (each on different host) - 2
non-data nodes and 3 data nodes. Then I've started pushing to cluster few GB
of data. While doing this I've seen on htop that ES processes are using more
and more memory. After some time I see in logs stuff like that:

[13:34:45,762][WARN ][jgroups.FD ] I was suspected
by ggmail-test-5-49710; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:34:46,042][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]
[13:34:54,373][WARN ][jgroups.pbcast.NAKACK ]
ggmail-test-3-55691: dropped message from ggmail-test-5-49710 (not in
xmit_table), keys are [ggmail-test-2-44830, ggmail-test-3-55691,
ggmail-test-2-29869, ggmail-test-4-9592, ggmail-test-1-19291],
view=[ggmail-test-1-19291|9] [ggmail-test-1-19291, ggmail-test-4-9592,
ggmail-test-2-29869, ggmail-test-3-55691, ggmail-test-2-44830]

and that:

[WARN ][monitor.jvm ] [Winky Man] Long GC collection
occurred, took [10s], breached threshold [10s]

I'm not able to put new data to ES (on existing index and type).
Curl returns:

{
"error" : "PrimaryNotStartedActionException[[mail_index1001][3]
Timeout waiting for [1m]]"
}

I configured in elasticsearch.in.sh Java heap space to 1GB.
Bigger values (like 3GB) changes only time after which this situations
happens.

I use configuration on data nodes:

cluster:
name: LocalCluster060

node:
data: true

http:
enabled: false

discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: 10.1.112.46[9700], 10.1.112.58[9700],
10.1.112.47[9700], 10.1.112.59[9700], 10.1.112.214[9700]

What is the reason of problem?