Is there any reason that match all queries would be impacted significantly
by index size?
It seems that in the absence of any sort, query or other mechanism
requiring scoring it should just be a matter of fetching the first document
from a shard. In practice that does not seem to be the case. On a cluster
with more than sufficient ram, registering no noticeable disk io, the
match_all query is reporting took times of 400-500ms. The match_all query
seems to use a significant amount of CPU, and when attempted concurrently
drives the CPU to 100% with only 30 concurrent requests. This also puts a
significant level of context switching on the nodes of the cluster.
The cluster in question is described in this post, though it now has 4 such
nodes and performance has not improved. Sairam has posted a few times
about it but each thread has just ended with no direction.
We were able to make some tweaks to the query with filters and sorts, such
that it is now significantly faster than the match_all query, took times as
low as 8 where previously it was 800.
What you see on the CPU is maybe the overhead of spinning off tasks to be
executed on the segments, maybe your segment number is high and your index
needs optimizing.
On an optimized index with 3 shards on 3 nodes on Red Hat Linux I see
match_all times around 20-50ms ("took" field).
Jörg
On Sun, Jul 6, 2014 at 6:35 PM, Aaron Mefford aaron@mefford.org wrote:
Is there any reason that match all queries would be impacted significantly
by index size?
It seems that in the absence of any sort, query or other mechanism
requiring scoring it should just be a matter of fetching the first document
from a shard. In practice that does not seem to be the case. On a cluster
with more than sufficient ram, registering no noticeable disk io, the
match_all query is reporting took times of 400-500ms. The match_all query
seems to use a significant amount of CPU, and when attempted concurrently
drives the CPU to 100% with only 30 concurrent requests. This also puts a
significant level of context switching on the nodes of the cluster.
The cluster in question is described in this post, though it now has 4
such nodes and performance has not improved. Sairam has posted a few times
about it but each thread has just ended with no direction.
We were able to make some tweaks to the query with filters and sorts, such
that it is now significantly faster than the match_all query, took times as
low as 8 where previously it was 800.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.