Very slow Get API requests on 0.18.5


(Colin Surprenant) #1

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #2

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Shay Banon) #3

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3 times,
and the 3rd one times out, do you always hit the same node? Are you running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #4

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3 times,
and the 3rd one times out, do you always hit the same node? Are you running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Shay Banon) #5

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #6

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my 0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id} requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Shay Banon) #7

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant <colin.surprenant@gmail.com

wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing

client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5

cluster. These are simple http://{host}/{index}/{type}/{id}
requests.

Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #8

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #9

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost, so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local for 3
times,
and the 3rd one times out, do you always hit the same node? Are you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Shay Banon) #10

Sorry I missed you, when you say failure, do you mean you get an exception,
or you don't get a response? (I assume you still use curl).

On Fri, Dec 9, 2011 at 11:42 PM, Colin Surprenant <
colin.surprenant@gmail.com> wrote:

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out (which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost,
so

yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com
wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local
for 3

times,
and the 3rd one times out, do you always hit the same node? Are
you

running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same
result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests, etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(Colin Surprenant) #11

the curl request just hangs, no timeout, no exception, no >= DEBUG log.

On Fri, Dec 9, 2011 at 4:44 PM, Shay Banon kimchy@gmail.com wrote:

Sorry I missed you, when you say failure, do you mean you get an exception,
or you don't get a response? (I assume you still use curl).

On Fri, Dec 9, 2011 at 11:42 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Tried to ping you on IRC.

The problem has reappeared with some documents and this time, using
?preference=_primary does not solve it, in fact it consistently fails
using _primary.

So, the lastest behaviour is: systematic 2 fails and 1 success not
using ?preference= or using ?preference=_local, and always fail with
?preference=_primary.

When I say fail, it actually just hangs, and after 10 minutes, my curl
request hasn't generated any log/exception on the local node or on the
master node.

Any suggestion?

Colin

On Mon, Dec 5, 2011 at 2:37 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Ok, thanks, will do.

Colin

On Mon, Dec 5, 2011 at 2:28 PM, Shay Banon kimchy@gmail.com wrote:

Strange..., if it happens again, ping me on IRC, lets try and debug it
"online", not really sure why it happened.

On Mon, Dec 5, 2011 at 8:39 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Exactly, and yes it was the same on all 6 nodes.

Unfortunately, I don't have the exception. When I actually did the
tests on each node, I didn't wait for the exception to be thrown and I
interrupted curl after waiting for a while. I also looked in the logs
and could not find anything.

Now I cannot reproduce the problem after rebooting 3 nodes this
weekend (ec2 maintenance events).

Colin

On Sun, Dec 4, 2011 at 2:06 PM, Shay Banon kimchy@gmail.com wrote:

And when you executed it on a specific node, you ran it 3 times with
preference set to _local, and the third time it would time out
(which
exception do you get)? Does it happen on all nodes (this behavior).

On Sun, Dec 4, 2011 at 6:17 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

I did the get tests directly on each nodes using curl on localhost,
so
yes, when doing the 3 get sequence, it was always hitting the same
node (localhost).

Colin

On Sun, Dec 4, 2011 at 9:15 AM, Shay Banon kimchy@gmail.com
wrote:

Do you have a load balancer infront of your client (or a load
balancing
client)? When you say you run it with preference set to _local
for 3
times,
and the 3rd one times out, do you always hit the same node? Are
you
running
it with curl directly against the nodes?

On Fri, Dec 2, 2011 at 8:49 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

More information after some more testing:

my index has 2 shards, 2 replicas on a 6 nodes cluster.

I tested on all 6 nodes and it systematically gives the same
result:

using no ?preference= or using ?preference=_local: the first 2
requests are successful, 3rd one times out.

using ?preference=_primary always works.

Colin

On Fri, Dec 2, 2011 at 12:27 PM, Colin Surprenant
colin.surprenant@gmail.com wrote:

Hi,

I am noticing very slow (timing out) Get API requests on my
0.18.5
cluster. These are simple http://{host}/{index}/{type}/{id}
requests.
Everything else is flowing, bulk inserts, search requests,
etc.
Absolutely no log (at DEBUG level), no failures on the the
cluster.
All nodes load/memory are ok.

In fact, if I do a search using
http://{host}/{index}/{type}/_search?q={id} it is very fast.

Any idea what I should look for to diagnose this?

Thanks,
Colin


(system) #12