What is causing random query time fluctuations


(Nik Everett) #1

When I repeat the same search over and over again I get somewhat wild
fluctuations in run time: 4ms, 4ms, 87ms, 17ms, 157ms, 4ms, 4ms, 4ms

Any ideas what might cause this? This only seems to happen on my
production cluster so I'm not likely to be able to make full reproduction
steps. I causing it with a simple match query without any filters or
anything.

Normally I trace performance stuff by causing lots of traffic and looking
at hot_threads but that these fluctuations don't feel like they'd be easy
to catch that way.

I have 16 nodes on real hardware with somewhat slow disks and 30GB heaps
with 96GB total ram. We write to the indexes constantly but slowly. Maybe
2-3 updates (not new docs, changes to old docs) per second per shard.

Any ideas would be great,

Thanks

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Itamar Syn-Hershko) #2

What is this query like? if it does sorting or faceting for example this
could be the field data being invalidated and loaded again, and this will
point at low levels of free RAM. Phrase queries require loading term
position data that is also being cached but can get invalidated. Etc etc

Also, this new feature may be worth having a look at
https://github.com/elasticsearch/elasticsearch/blob/master/docs/reference/indices/benchmark.asciidoc

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Fri, Apr 11, 2014 at 2:24 PM, Nikolas Everett nik9000@gmail.com wrote:

When I repeat the same search over and over again I get somewhat wild
fluctuations in run time: 4ms, 4ms, 87ms, 17ms, 157ms, 4ms, 4ms, 4ms

Any ideas what might cause this? This only seems to happen on my
production cluster so I'm not likely to be able to make full reproduction
steps. I causing it with a simple match query without any filters or
anything.

Normally I trace performance stuff by causing lots of traffic and looking
at hot_threads but that these fluctuations don't feel like they'd be easy
to catch that way.

I have 16 nodes on real hardware with somewhat slow disks and 30GB heaps
with 96GB total ram. We write to the indexes constantly but slowly. Maybe
2-3 updates (not new docs, changes to old docs) per second per shard.

Any ideas would be great,

Thanks

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zvw1Nnuvh_B%3DkGm_4K7Rs97CgSOqzB3DFX1j1gz%2Bajg%3Dw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #3

Could this be caused by Lucene merges at some point? I guess also that when a segment is commited, it could take some time to compute cache for new segments?

Are you using warmers?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 avr. 2014 à 15:24, Nikolas Everett nik9000@gmail.com a écrit :

When I repeat the same search over and over again I get somewhat wild fluctuations in run time: 4ms, 4ms, 87ms, 17ms, 157ms, 4ms, 4ms, 4ms

Any ideas what might cause this? This only seems to happen on my production cluster so I'm not likely to be able to make full reproduction steps. I causing it with a simple match query without any filters or anything.

Normally I trace performance stuff by causing lots of traffic and looking at hot_threads but that these fluctuations don't feel like they'd be easy to catch that way.

I have 16 nodes on real hardware with somewhat slow disks and 30GB heaps with 96GB total ram. We write to the indexes constantly but slowly. Maybe 2-3 updates (not new docs, changes to old docs) per second per shard.

Any ideas would be great,

Thanks

Nik

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/D9F22255-CBFE-4DE5-BBF2-BEF7F76624FC%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #4

Itamar:
Its really just a match query. No sorting or anything:
POST /enwiki_content/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "main page",
"fields": [
"title.plain"
]
}
}
}

I mean, I do more complicated things most of the time, but I'm still able
to reproduce the fluctuations with the match query.

David:
I never turned on warmers or any eager loading. I'll give that a shot and
report back.

Nik

On Fri, Apr 11, 2014 at 10:02 AM, David Pilato david@pilato.fr wrote:

Could this be caused by Lucene merges at some point? I guess also that
when a segment is commited, it could take some time to compute cache for
new segments?

Are you using warmers?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 avr. 2014 à 15:24, Nikolas Everett nik9000@gmail.com a écrit :

When I repeat the same search over and over again I get somewhat wild
fluctuations in run time: 4ms, 4ms, 87ms, 17ms, 157ms, 4ms, 4ms, 4ms

Any ideas what might cause this? This only seems to happen on my
production cluster so I'm not likely to be able to make full reproduction
steps. I causing it with a simple match query without any filters or
anything.

Normally I trace performance stuff by causing lots of traffic and looking
at hot_threads but that these fluctuations don't feel like they'd be easy
to catch that way.

I have 16 nodes on real hardware with somewhat slow disks and 30GB heaps
with 96GB total ram. We write to the indexes constantly but slowly. Maybe
2-3 updates (not new docs, changes to old docs) per second per shard.

Any ideas would be great,

Thanks

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/D9F22255-CBFE-4DE5-BBF2-BEF7F76624FC%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/D9F22255-CBFE-4DE5-BBF2-BEF7F76624FC%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd15A1L2y39Tm%2Bidp9DHv-g8d-k74Qk_maVGsj--6o9kjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #5

Even simpler:
POST /enwiki_content/_search
{
"_source": false,
"query": {
"match": {
"title.plain": "main page"
}
}
}

I tried adding that query as a warmer and I still get fluctuations. I
tried raising the refresh interval to 30s and I still get it too.

I'm reading the times from the "took" field. I think this is ok because it
should eliminate stuff like network latency from the measure.

Do you think eager norms loading would help?

I wonder if I should look at eager norms loading....

I wonder if this is just the kind of thing that'll average out over all the
queries or will it get magnified? As in could these bunch up and turn into
some kind of stampede?

On closer examination, ganglia is telling me that young GCs average about
45ms on the production cluster and occur about every two seconds. That
might fit.

Also, I see the same delays when I manually hammer smaller indexes, just
less frequently.

Nik

On Fri, Apr 11, 2014 at 10:31 AM, Nikolas Everett nik9000@gmail.com wrote:

Itamar:
Its really just a match query. No sorting or anything:
POST /enwiki_content/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "main page",
"fields": [
"title.plain"
]
}
}
}

I mean, I do more complicated things most of the time, but I'm still able
to reproduce the fluctuations with the match query.

David:
I never turned on warmers or any eager loading. I'll give that a shot and
report back.

Nik

On Fri, Apr 11, 2014 at 10:02 AM, David Pilato david@pilato.fr wrote:

Could this be caused by Lucene merges at some point? I guess also that
when a segment is commited, it could take some time to compute cache for
new segments?

Are you using warmers?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 avr. 2014 à 15:24, Nikolas Everett nik9000@gmail.com a écrit :

When I repeat the same search over and over again I get somewhat wild
fluctuations in run time: 4ms, 4ms, 87ms, 17ms, 157ms, 4ms, 4ms, 4ms

Any ideas what might cause this? This only seems to happen on my
production cluster so I'm not likely to be able to make full reproduction
steps. I causing it with a simple match query without any filters or
anything.

Normally I trace performance stuff by causing lots of traffic and looking
at hot_threads but that these fluctuations don't feel like they'd be easy
to catch that way.

I have 16 nodes on real hardware with somewhat slow disks and 30GB heaps
with 96GB total ram. We write to the indexes constantly but slowly. Maybe
2-3 updates (not new docs, changes to old docs) per second per shard.

Any ideas would be great,

Thanks

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.comhttps://groups.google.com/d/msgid/elasticsearch/CAPmjWd2hR-CDOsOy-o1Opqc8V6GWJ5Z_qRxAOsi2yrc_5RGBmQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/D9F22255-CBFE-4DE5-BBF2-BEF7F76624FC%40pilato.frhttps://groups.google.com/d/msgid/elasticsearch/D9F22255-CBFE-4DE5-BBF2-BEF7F76624FC%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd16tqDRcDQQykm0UAAWeygUbuCTYi%2B%3D6UocSkbf7k7nOw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #6