On the good node all of the above works but also search (!) . It is
important to note the both nodes are needed for searching as both contain
part of the data.
I think this means that this is not a networking issue (running out of
sockets etc.) because of #3. It's also not something structural with ES
(like garbage collection etc.) because searches through the healthy node
work just fine.
Does anyone recognise the pattern/have ideas as to how to proceed with
debugging?
On the good node all of the above works but also search (!) . It is
important to note the both nodes are needed for searching as both contain
part of the data.
I think this means that this is not a networking issue (running out of
sockets etc.) because of #3. It's also not something structural with ES
(like garbage collection etc.) because searches through the healthy node
work just fine.
Does anyone recognise the pattern/have ideas as to how to proceed with
debugging?
Thanks,
Boaz
--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/WXIwtR_AgoI/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Are you sure you have enabled GC monitoring and there is nothing in the log?
Looks like your node has resource shortage issues. I recommend firing up
BigDesk for an impression about the resource usage, or jvisualvm for
direct JVM examination to monitor the resource usage and see what is
going on.
Jörg
Am 29.04.13 19:42, schrieb Boaz Leskes:
Yes. Afterwards everything is OK for about 4 hours and then it happens
again..
Are you sure you have enabled GC monitoring and there is nothing in the log?
Looks like your node has resource shortage issues. I recommend firing up
BigDesk for an impression about the resource usage, or jvisualvm for
direct JVM examination to monitor the resource usage and see what is
going on.
Jörg
Am 29.04.13 19:42, schrieb Boaz Leskes:
Yes. Afterwards everything is OK for about 4 hours and then it happens
again..
The index has 2 replicas, and we observe this happening also for non-REST
clients (native Java clients), so this is probably something related to
threading on the server?
Boaz, were you able to get to any resolution on this?
We ended up giving the nodes some more memory and it stopped happening. I
was blindly guessing memory might help because it started happening after
we rolled out a memory intensive feature. Can't say I'm happy with that
kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can you
also supply a sample data? Do you need to run in an multi-node scenario or
is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will timeout
when executed twice in a row, and pretty much sporadically otherwise:
The index has 2 replicas, and we observe this happening also for non-REST
clients (native Java clients), so this is probably something related to
threading on the server?
Boaz, were you able to get to any resolution on this?
On Mon, Apr 29, 2013 at 9:19 PM, Jörg Prante <joerg...@gmail.com<javascript:>
wrote:
No good ideas. Just shooting in the dark. Maybe an issue with certain
queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
What do you mean "giving"? are you running on a VM, or was it a setting you
changed?
It's any query, as a matter of a fact. We have multi-node setup and that
specific index has 2 replicas so it should be spread between them (though I
admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where the
setup is multi-node. I believe any such setup with any data with multiple
clients accessing it concurrently will evantually get to that faulty state.
I'll be happy to work with you and the ES team on reproducing and killing
this issue. I believe it should be escalated and resolved quickly - it's
definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting timeouts -
I recall Shay saying its on a best-effort basis. If ES would reliably
timeout lengthy requests this starvation scenario wouldn't happen.
We ended up giving the nodes some more memory and it stopped happening. I
was blindly guessing memory might help because it started happening after
we rolled out a memory intensive feature. Can't say I'm happy with that
kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can you
also supply a sample data? Do you need to run in an multi-node scenario or
is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will timeout
when executed twice in a row, and pretty much sporadically otherwise:
The index has 2 replicas, and we observe this happening also for non-REST
clients (native Java clients), so this is probably something related to
threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with certain
queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
We gave the VM more memory (thus leaving less to the file caches).
It seems you guys are experiencing something slightly different than what
we had - the number connected clients in our case stayed the same. It might
still be a resources problem but of a very specific nature as we couldn't
reproduce it in tests either.
I'd love to try and help but I need some more info. It would be awesome if
you can give a recipe that consistently (/with high percentage) gets it to
block.
On Tue, May 21, 2013 at 1:12 PM, Itamar Syn-Hershko itamar@code972.comwrote:
What do you mean "giving"? are you running on a VM, or was it a setting
you changed?
It's any query, as a matter of a fact. We have multi-node setup and that
specific index has 2 replicas so it should be spread between them (though I
admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where the
setup is multi-node. I believe any such setup with any data with multiple
clients accessing it concurrently will evantually get to that faulty state.
I'll be happy to work with you and the ES team on reproducing and killing
this issue. I believe it should be escalated and resolved quickly - it's
definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting timeouts -
I recall Shay saying its on a best-effort basis. If ES would reliably
timeout lengthy requests this starvation scenario wouldn't happen.
We ended up giving the nodes some more memory and it stopped happening. I
was blindly guessing memory might help because it started happening after
we rolled out a memory intensive feature. Can't say I'm happy with that
kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can you
also supply a sample data? Do you need to run in an multi-node scenario or
is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will timeout
when executed twice in a row, and pretty much sporadically otherwise:
The index has 2 replicas, and we observe this happening also for
non-REST clients (native Java clients), so this is probably something
related to threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with certain
queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
The number of connected clients is the same - but its larger than one and
they are actively working against the cluster, mostly issuing searches.
Concurrent requests to the cluster seem to cause thread starvation. And we
weren't able to reproduce this in tests yet.
I'll try to work on something reproducible. In the meantime - were you able
to find anything in the logs?
We gave the VM more memory (thus leaving less to the file caches).
It seems you guys are experiencing something slightly different than what
we had - the number connected clients in our case stayed the same. It might
still be a resources problem but of a very specific nature as we couldn't
reproduce it in tests either.
I'd love to try and help but I need some more info. It would be awesome if
you can give a recipe that consistently (/with high percentage) gets it to
block.
On Tue, May 21, 2013 at 1:12 PM, Itamar Syn-Hershko itamar@code972.comwrote:
What do you mean "giving"? are you running on a VM, or was it a setting
you changed?
It's any query, as a matter of a fact. We have multi-node setup and that
specific index has 2 replicas so it should be spread between them (though I
admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where the
setup is multi-node. I believe any such setup with any data with multiple
clients accessing it concurrently will evantually get to that faulty state.
I'll be happy to work with you and the ES team on reproducing and killing
this issue. I believe it should be escalated and resolved quickly - it's
definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting timeouts
I recall Shay saying its on a best-effort basis. If ES would reliably
timeout lengthy requests this starvation scenario wouldn't happen.
We ended up giving the nodes some more memory and it stopped happening.
I was blindly guessing memory might help because it started happening after
we rolled out a memory intensive feature. Can't say I'm happy with that
kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can you
also supply a sample data? Do you need to run in an multi-node scenario or
is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will timeout
when executed twice in a row, and pretty much sporadically otherwise:
The index has 2 replicas, and we observe this happening also for
non-REST clients (native Java clients), so this is probably something
related to threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with certain
queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
Hopefully you can get it more concrete. I'll try to find some time to look
at the code again later. What version are you running?
On Tue, May 21, 2013 at 1:28 PM, Itamar Syn-Hershko itamar@code972.comwrote:
The number of connected clients is the same - but its larger than one and
they are actively working against the cluster, mostly issuing searches.
Concurrent requests to the cluster seem to cause thread starvation. And we
weren't able to reproduce this in tests yet.
I'll try to work on something reproducible. In the meantime - were you
able to find anything in the logs?
We gave the VM more memory (thus leaving less to the file caches).
It seems you guys are experiencing something slightly different than what
we had - the number connected clients in our case stayed the same. It might
still be a resources problem but of a very specific nature as we couldn't
reproduce it in tests either.
I'd love to try and help but I need some more info. It would be awesome
if you can give a recipe that consistently (/with high percentage) gets it
to block.
On Tue, May 21, 2013 at 1:12 PM, Itamar Syn-Hershko itamar@code972.comwrote:
What do you mean "giving"? are you running on a VM, or was it a setting
you changed?
It's any query, as a matter of a fact. We have multi-node setup and that
specific index has 2 replicas so it should be spread between them (though I
admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where
the setup is multi-node. I believe any such setup with any data with
multiple clients accessing it concurrently will evantually get to that
faulty state. I'll be happy to work with you and the ES team on reproducing
and killing this issue. I believe it should be escalated and resolved
quickly - it's definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting timeouts
I recall Shay saying its on a best-effort basis. If ES would reliably
timeout lengthy requests this starvation scenario wouldn't happen.
We ended up giving the nodes some more memory and it stopped happening.
I was blindly guessing memory might help because it started happening after
we rolled out a memory intensive feature. Can't say I'm happy with that
kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can you
also supply a sample data? Do you need to run in an multi-node scenario or
is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will
timeout when executed twice in a row, and pretty much sporadically
otherwise:
The index has 2 replicas, and we observe this happening also for
non-REST clients (native Java clients), so this is probably something
related to threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with certain
queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
Hopefully you can get it more concrete. I'll try to find some time to look
at the code again later. What version are you running?
On Tue, May 21, 2013 at 1:28 PM, Itamar Syn-Hershko itamar@code972.comwrote:
The number of connected clients is the same - but its larger than one and
they are actively working against the cluster, mostly issuing searches.
Concurrent requests to the cluster seem to cause thread starvation. And we
weren't able to reproduce this in tests yet.
I'll try to work on something reproducible. In the meantime - were you
able to find anything in the logs?
We gave the VM more memory (thus leaving less to the file caches).
It seems you guys are experiencing something slightly different than
what we had - the number connected clients in our case stayed the same. It
might still be a resources problem but of a very specific nature as we
couldn't reproduce it in tests either.
I'd love to try and help but I need some more info. It would be awesome
if you can give a recipe that consistently (/with high percentage) gets it
to block.
On Tue, May 21, 2013 at 1:12 PM, Itamar Syn-Hershko itamar@code972.comwrote:
What do you mean "giving"? are you running on a VM, or was it a setting
you changed?
It's any query, as a matter of a fact. We have multi-node setup and
that specific index has 2 replicas so it should be spread between them
(though I admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where
the setup is multi-node. I believe any such setup with any data with
multiple clients accessing it concurrently will evantually get to that
faulty state. I'll be happy to work with you and the ES team on reproducing
and killing this issue. I believe it should be escalated and resolved
quickly - it's definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting
timeouts - I recall Shay saying its on a best-effort basis. If ES would
reliably timeout lengthy requests this starvation scenario wouldn't happen.
We ended up giving the nodes some more memory and it stopped
happening. I was blindly guessing memory might help because it started
happening after we rolled out a memory intensive feature. Can't say I'm
happy with that kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can
you also supply a sample data? Do you need to run in an multi-node scenario
or is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will
timeout when executed twice in a row, and pretty much sporadically
otherwise:
The index has 2 replicas, and we observe this happening also for
non-REST clients (native Java clients), so this is probably something
related to threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with
certain queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node searches
coming from the other node work fine. What other resources should I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
On Tuesday, May 21, 2013 1:35:54 PM UTC+2, Itamar Syn-Hershko wrote:
0.90 with some additional recent commits (we are compiling a fork, no
relevant changes to the core)
On Tue, May 21, 2013 at 2:33 PM, Boaz Leskes <b.le...@gmail.com<javascript:>
wrote:
No, sadly nothing in the logs.
Hopefully you can get it more concrete. I'll try to find some time to
look at the code again later. What version are you running?
On Tue, May 21, 2013 at 1:28 PM, Itamar Syn-Hershko <ita...@code972.com<javascript:>
wrote:
The number of connected clients is the same - but its larger than one
and they are actively working against the cluster, mostly issuing searches.
Concurrent requests to the cluster seem to cause thread starvation. And we
weren't able to reproduce this in tests yet.
I'll try to work on something reproducible. In the meantime - were you
able to find anything in the logs?
On Tue, May 21, 2013 at 2:21 PM, Boaz Leskes <b.le...@gmail.com<javascript:>
wrote:
We gave the VM more memory (thus leaving less to the file caches).
It seems you guys are experiencing something slightly different than
what we had - the number connected clients in our case stayed the same. It
might still be a resources problem but of a very specific nature as we
couldn't reproduce it in tests either.
I'd love to try and help but I need some more info. It would be awesome
if you can give a recipe that consistently (/with high percentage) gets it
to block.
On Tue, May 21, 2013 at 1:12 PM, Itamar Syn-Hershko <ita...@code972.com<javascript:>
wrote:
What do you mean "giving"? are you running on a VM, or was it a
setting you changed?
It's any query, as a matter of a fact. We have multi-node setup and
that specific index has 2 replicas so it should be spread between them
(though I admit believed the impl and haven't checked that).
I never seen this happen in our tests, only on our live cluster where
the setup is multi-node. I believe any such setup with any data with
multiple clients accessing it concurrently will evantually get to that
faulty state. I'll be happy to work with you and the ES team on reproducing
and killing this issue. I believe it should be escalated and resolved
quickly - it's definitely a show stopper.
Another big issue I'm seeing at this point is ES not respecting
timeouts - I recall Shay saying its on a best-effort basis. If ES would
reliably timeout lengthy requests this starvation scenario wouldn't happen.
On Tue, May 21, 2013 at 1:56 PM, Boaz Leskes <b.le...@gmail.com<javascript:>
wrote:
Hi Itamar,
We ended up giving the nodes some more memory and it stopped
happening. I was blindly guessing memory might help because it started
happening after we rolled out a memory intensive feature. Can't say I'm
happy with that kind of "solution".
You mention the query is a reliable way to reproduce the issue. Can
you also supply a sample data? Do you need to run in an multi-node scenario
or is one node also enough?
Boaz
On Tuesday, May 21, 2013 12:42:00 PM UTC+2, Itamar Syn-Hershko wrote:
We are seeing similar behavior, the following curl command will
timeout when executed twice in a row, and pretty much sporadically
otherwise:
The index has 2 replicas, and we observe this happening also for
non-REST clients (native Java clients), so this is probably something
related to threading on the server?
Boaz, were you able to get to any resolution on this?
No good ideas. Just shooting in the dark. Maybe an issue with
certain queries? Cache config? Shard distribution? Thread pools?
Jörg
Am 29.04.13 20:01, schrieb Boaz Leskes:
I had it under visualvm and GC looks good. Also intra node
searches coming from the other node work fine. What other resources should
I look at?
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com <javascript:>.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.