Search Response Time for "match all" query

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search
performance. Below is my cluster setup :

  1. Master on separate machine (node.master: true, node.data : false)
  2. 4 data nodes on separate machines (node.master: false, node.data:
    true)
  3. index.num_of_shards = 4 & index.num_replicas = 1

Those are the only settings I've changed in the elasticsearch.yml file of
respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4
version. I indexed 4 million documents where each document has 10 english
sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as
I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other
processes are running on it apart from ES).

  1. "*" query - to get all docs (i.e. 4 Million hits)
  2. "*" query with size set to 100 - to get only top 100 hits
  3. a term query with size again set to 4 Million to retrieve all rows
  4. same term query with size set to 100 - to get only top 100 hits

It is encouraging to see the response times of queries 2,3 & 4 when
compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and
consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279
secs where as SolrCloud took 55 secs.

Why is it the "match_all" query is ES is taking so much more time compared
to Solr ? Is the "match_all" query a bottleneck for ES / known limitation
for ES.

Thanks,
Phani.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.