Match All query performance


(Aaron Mefford) #1

Is there any reason that match all queries would be impacted significantly
by index size?

It seems that in the absence of any sort, query or other mechanism
requiring scoring it should just be a matter of fetching the first document
from a shard. In practice that does not seem to be the case. On a cluster
with more than sufficient ram, registering no noticeable disk io, the
match_all query is reporting took times of 400-500ms. The match_all query
seems to use a significant amount of CPU, and when attempted concurrently
drives the CPU to 100% with only 30 concurrent requests. This also puts a
significant level of context switching on the nodes of the cluster.

The cluster in question is described in this post, though it now has 4 such
nodes and performance has not improved. Sairam has posted a few times
about it but each thread has just ended with no direction.

https://groups.google.com/d/msg/elasticsearch/P1o_4bVvECA/lDbCp_rCH_YJ

We were able to make some tweaks to the query with filters and sorts, such
that it is now significantly faster than the match_all query, took times as
low as 8 where previously it was 800.

Is there something that I am missing?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e88051f-b3b1-44d1-87e5-26245b4e3ab3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

What you see on the CPU is maybe the overhead of spinning off tasks to be
executed on the segments, maybe your segment number is high and your index
needs optimizing.

On an optimized index with 3 shards on 3 nodes on Red Hat Linux I see
match_all times around 20-50ms ("took" field).

Jörg

On Sun, Jul 6, 2014 at 6:35 PM, Aaron Mefford aaron@mefford.org wrote:

Is there any reason that match all queries would be impacted significantly
by index size?

It seems that in the absence of any sort, query or other mechanism
requiring scoring it should just be a matter of fetching the first document
from a shard. In practice that does not seem to be the case. On a cluster
with more than sufficient ram, registering no noticeable disk io, the
match_all query is reporting took times of 400-500ms. The match_all query
seems to use a significant amount of CPU, and when attempted concurrently
drives the CPU to 100% with only 30 concurrent requests. This also puts a
significant level of context switching on the nodes of the cluster.

The cluster in question is described in this post, though it now has 4
such nodes and performance has not improved. Sairam has posted a few times
about it but each thread has just ended with no direction.

https://groups.google.com/d/msg/elasticsearch/P1o_4bVvECA/lDbCp_rCH_YJ

We were able to make some tweaks to the query with filters and sorts, such
that it is now significantly faster than the match_all query, took times as
low as 8 where previously it was 800.

Is there something that I am missing?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0e88051f-b3b1-44d1-87e5-26245b4e3ab3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0e88051f-b3b1-44d1-87e5-26245b4e3ab3%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFXH6w5A%2BaQsV2nBjB%3DjqpzRZpVCcCnnMQLrqSfG0WkEw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3