Huge variance in query time


(Mike Peters) #1

Hi,

I'm trying to understand why queries some time respond instantly and
some time take a very long time to complete.

No major load on elasticsearch, using 6 servers, 3 shards.

We run the same exact query in a loop and while it's usually very
fast, every minute or so, the query time jumps to above 5 seconds.

Any ideas what could possibly be causing this?

See below example of running the same query twice, 1 second apart.
First time it's fast, then it gets slow, then fast again.

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"false","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"filter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,

--

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"fase","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"filter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 5845,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,


(Karussell) #2

Any ideas what could possibly be causing this?

garbage collection?

We run the same exact query in a loop

which features do you use in the query? how many queries / second and
how many RAM do you have?

Peter.

On 12 Jan., 16:42, Mike Peters m...@softwareprojects.com wrote:

Hi,

I'm trying to understand why queries some time respond instantly and
some time take a very long time to complete.

No major load on elasticsearch, using 6 servers, 3 shards.

We run the same exact query in a loop and while it's usually very
fast, every minute or so, the query time jumps to above 5 seconds.

Any ideas what could possibly be causing this?

See below example of running the same query twice, 1 second apart.
First time it's fast, then it gets slow, then fast again.

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"false","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil ter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,

--

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"fase","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil ter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 5845,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,


(Shay Banon) #3

Another option is file system cache. You are using a prefix query that can
be quite heavy to execute, and its much more affected by file system cache
compared to other queries.

Start by monitoring memory usage and work from there.

On Fri, Jan 13, 2012 at 12:01 PM, Karussell tableyourtime@googlemail.comwrote:

Any ideas what could possibly be causing this?

garbage collection?

We run the same exact query in a loop

which features do you use in the query? how many queries / second and
how many RAM do you have?

Peter.

On 12 Jan., 16:42, Mike Peters m...@softwareprojects.com wrote:

Hi,

I'm trying to understand why queries some time respond instantly and
some time take a very long time to complete.

No major load on elasticsearch, using 6 servers, 3 shards.

We run the same exact query in a loop and while it's usually very
fast, every minute or so, the query time jumps to above 5 seconds.

Any ideas what could possibly be causing this?

See below example of running the same query twice, 1 second apart.
First time it's fast, then it gets slow, then fast again.

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"false","fields":

["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil
ter":

{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,

--

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"fase","fields":

["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil
ter":

{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 5845,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,


(Mike Peters) #4

We've eliminated file-system-cache as the issue (iostat is good) and
memory stays stable at 20% utilization

Load is very low too

Any other ideas?

Any way to enable a verbose debug log, so that we can see where
elasticsearch is spending the time while processing these type of
queries?

Mike

On Jan 13, 5:01 am, Karussell tableyourt...@googlemail.com wrote:

Any ideas what could possibly be causing this?

garbage collection?

We run the same exact query in a loop

which features do you use in the query? how many queries / second and
how many RAM do you have?

Peter.

On 12 Jan., 16:42,MikePetersm...@softwareprojects.com wrote:

Hi,

I'm trying to understand why queries some time respond instantly and
some time take a very long time to complete.

No major load on elasticsearch, using 6 servers, 3 shards.

We run the same exact query in a loop and while it's usually very
fast, every minute or so, the query time jumps to above 5 seconds.

Any ideas what could possibly be causing this?

See below example of running the same query twice, 1 second apart.
First time it's fast, then it gets slow, then fast again.

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"false","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil ter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,

--

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"fase","fields":
["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil ter":
{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 5845,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,


(Shay Banon) #5

What are you running on? This reminds me of the problem we had once on AWS
with Ubuntu 10.04, where suddenly a GC will take a really long time,
without any user time spent on it, which ended up a problem with AWS and
Ubuntu. I suggest you enable GC logging (see elaticsearch.in.sh) and see
what happens during the time that the query takes time.

On Tue, Jan 17, 2012 at 5:35 PM, Mike Peters mike@softwareprojects.comwrote:

We've eliminated file-system-cache as the issue (iostat is good) and
memory stays stable at 20% utilization

Load is very low too

Any other ideas?

Any way to enable a verbose debug log, so that we can see where
elasticsearch is spending the time while processing these type of
queries?

Mike

On Jan 13, 5:01 am, Karussell tableyourt...@googlemail.com wrote:

Any ideas what could possibly be causing this?

garbage collection?

We run the same exact query in a loop

which features do you use in the query? how many queries / second and
how many RAM do you have?

Peter.

On 12 Jan., 16:42,MikePetersm...@softwareprojects.com wrote:

Hi,

I'm trying to understand why queries some time respond instantly and
some time take a very long time to complete.

No major load on elasticsearch, using 6 servers, 3 shards.

We run the same exact query in a loop and while it's usually very
fast, every minute or so, the query time jumps to above 5 seconds.

Any ideas what could possibly be causing this?

See below example of running the same query twice, 1 second apart.
First time it's fast, then it gets slow, then fast again.

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"false","fields":

["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil
ter":

{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 45,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,

--

curl -XGET "http://localhost:5004/spi/storeproducts/_search?
pretty=true" -d '{"from":"0","size":"20","sort":{"_score":""},"query":
{"filtered":{"query":{"query_string":
{"query":"ppc*","phraseSlop":"0","useDisMax":"fase","fields":

["name.s^20","emailaddress.s^15","emailaddress_aliases.s^10","_all"]}},"fil
ter":

{"and":[{"term":{"_account_id":"1429"}}]}}}}'
{
"took" : 5845,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 6,
"max_score" : 0.03996804,


(system) #6