Very slow Get API requests on 0.18.5

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3 times,
and the 3rd one times out, do you always hit the same node? Are you running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3 times,
and the 3rd one times out, do you always hit the same node? Are you running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

Sorry I missed you, when you say failure, do you mean you get an exception,
or you don't get a response? (I assume you still use curl).

On Fri, Dec 9, 2011 at 11:42 PM, Colin Surprenant <
colin.surprenant@gmail.com> wrote:

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost,
so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com
wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local
for 3
times,
and the 3rd one times out, do you always hit the same node? Are
you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same
result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin

the curl request just hangs, no timeout, no exception, no >= DEBUG log.

On Fri, Dec 9, 2011 at 4:44 PM, Shay Banon kimchy@gmail.com wrote:

Sorry I missed you, when you say failure, do you mean you get an exception,
or you don't get a response? (I assume you still use curl).

On Fri, Dec 9, 2011 at 11:42 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out
(which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost,
so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com
wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local
for 3
times,
and the 3rd one times out, do you always hit the same node? Are
you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same
result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests,
etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin