Low priority queries or query throttling?

Hi,
I am currently having trouble with fairly slow and intensive queries
causing excessive load on my elasticsearch cluster and I would like to know
people's opinions on ways to mitigate or prevent that excessive load.

We attempt about 50 of these slow queries per second, and they take an
average of 300ms which adds up to more than we can process,, causing
excessive load and sometimes causing elasticsearch to become non-responsive.

The slow queries are all low priority, and we have other, high priority
queries running on the index. Slow queries could take 10 seconds to process
for all we care, and we'd rather have them fail than cause excessive load
on the cluster.

Is there a way to give these queries a lower priority and to force them to
use no more than a certain percentage of the cluster's resources? Or is it
possible to refuse certain types of queries if elasticsearch is under
excessive load?

I am also curious if people have thoughts on what could improve the
throughput of these queries based on the information given below. I can
give more details about the structure of the queries themselves if
necessary.

The cluster is made of two machines (both have 16 CPU cores and let
elasticsearch use 15G of memory) running ES 0.90.12 with the G1 garbage
collector. I began experiencing higher load with this setup than before
when I upgraded to 0.90.12 from 0.90.0 beta (which was using the CMS
collector with the default settings), however, several other changes were
made at the same time and it isn't yet fully clear whether either the
change of version or the change of GC is responsible, or whether it was
simply a coincidence. Some thoughts on that would be appreciated.

The index has 5 shards and one replica (both machines each have a version
of all shards), is a couple gigs in size and contains a couple million
documents.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/38e8c947-7dff-4eab-9cc0-b492e20a769e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What's the nature of the queries? There may be some optimizations that can
be made.

How much memory is on the box total?

I would not recommend G1 GC. It is promising but we still see bug reports
where G1 just straight up crashes. For now, the official ES recommendation
is still CMS. FWIW, G1 will use more CPU than CMS by definition, because
of the way G1 operates (e.g. shorter pauses at the cost of more CPU). That
could partially explain your increased load.

There is currently no way to give a priority to queries, although I agree
that would be very nice. There are some tricks you can do to control where
queries go using search preferences (see
Elasticsearch Platform — Find real-time answers at scale | Elastic).
For example, you could send all "slow" queries to a single node, and send
all other queries to the rest of your cluster. That would effectively
bottleneck the slow queries, assuming the "slow node" has all the shards
required to execute the query.

Similarly, you can use Allocation Awareness and Forced Zones to control
which indices end up on which shards, etc.

On Thursday, March 13, 2014 5:15:58 PM UTC-4, Peter Wright wrote:

Hi,
I am currently having trouble with fairly slow and intensive queries
causing excessive load on my elasticsearch cluster and I would like to
know people's opinions on ways to mitigate or prevent that excessive load.

We attempt about 50 of these slow queries per second, and they take an
average of 300ms which adds up to more than we can process,, causing
excessive load and sometimes causing elasticsearch to become
non-responsive.

The slow queries are all low priority, and we have other, high priority
queries running on the index. Slow queries could take 10 seconds to process
for all we care, and we'd rather have them fail than cause excessive load
on the cluster.

Is there a way to give these queries a lower priority and to force them to
use no more than a certain percentage of the cluster's resources? Or is it
possible to refuse certain types of queries if elasticsearch is under
excessive load?

I am also curious if people have thoughts on what could improve the
throughput of these queries based on the information given below. I can
give more details about the structure of the queries themselves if
necessary.

The cluster is made of two machines (both have 16 CPU cores and let
elasticsearch use 15G of memory) running ES 0.90.12 with the G1 garbage
collector. I began experiencing higher load with this setup than before
when I upgraded to 0.90.12 from 0.90.0 beta (which was using the CMS
collector with the default settings), however, several other changes were
made at the same time and it isn't yet fully clear whether either the
change of version or the change of GC is responsible, or whether it was
simply a coincidence. Some thoughts on that would be appreciated.

The index has 5 shards and one replica (both machines each have a version
of all shards), is a couple gigs in size and contains a couple million
documents.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a13d004a-e180-43ad-90c1-132bd05bfdfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adding to what Zach said, I'd also be interested in looking at what causes
these queries to be so slow. Potentially their performance could be greatly
improved.

clint

On 14 March 2014 01:29, Zachary Tong zacharyjtong@gmail.com wrote:

What's the nature of the queries? There may be some optimizations that
can be made.

How much memory is on the box total?

I would not recommend G1 GC. It is promising but we still see bug reports
where G1 just straight up crashes. For now, the official ES recommendation
is still CMS. FWIW, G1 will use more CPU than CMS by definition, because
of the way G1 operates (e.g. shorter pauses at the cost of more CPU). That
could partially explain your increased load.

There is currently no way to give a priority to queries, although I agree
that would be very nice. There are some tricks you can do to control where
queries go using search preferences (see
Elasticsearch Platform — Find real-time answers at scale | Elastic).
For example, you could send all "slow" queries to a single node, and send
all other queries to the rest of your cluster. That would effectively
bottleneck the slow queries, assuming the "slow node" has all the shards
required to execute the query.

Similarly, you can use Allocation Awareness and Forced Zones to control
which indices end up on which shards, etc.

On Thursday, March 13, 2014 5:15:58 PM UTC-4, Peter Wright wrote:

Hi,
I am currently having trouble with fairly slow and intensive queries
causing excessive load on my elasticsearch cluster and I would like to
know people's opinions on ways to mitigate or prevent that excessive load.

We attempt about 50 of these slow queries per second, and they take an
average of 300ms which adds up to more than we can process,, causing
excessive load and sometimes causing elasticsearch to become
non-responsive.

The slow queries are all low priority, and we have other, high priority
queries running on the index. Slow queries could take 10 seconds to process
for all we care, and we'd rather have them fail than cause excessive load
on the cluster.

Is there a way to give these queries a lower priority and to force them
to use no more than a certain percentage of the cluster's resources? Or is
it possible to refuse certain types of queries if elasticsearch is under
excessive load?

I am also curious if people have thoughts on what could improve the
throughput of these queries based on the information given below. I can
give more details about the structure of the queries themselves if
necessary.

The cluster is made of two machines (both have 16 CPU cores and let
elasticsearch use 15G of memory) running ES 0.90.12 with the G1 garbage
collector. I began experiencing higher load with this setup than before
when I upgraded to 0.90.12 from 0.90.0 beta (which was using the CMS
collector with the default settings), however, several other changes were
made at the same time and it isn't yet fully clear whether either the
change of version or the change of GC is responsible, or whether it was
simply a coincidence. Some thoughts on that would be appreciated.

The index has 5 shards and one replica (both machines each have a
version of all shards), is a couple gigs in size and contains a couple
million documents.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a13d004a-e180-43ad-90c1-132bd05bfdfa%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a13d004a-e180-43ad-90c1-132bd05bfdfa%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKQTwVo-aRE2SA6xy5fzJ6tkOHOAckq4zCMPAr4D08t4rw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.