Search Time (Response Time) for matchAll query


(Phani) #1

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search
performance. Below is my cluster setup :

  1. Master on separate machine (node.master: true, node.data : false)
  2. 4 data nodes on separate machines (node.master: false, node.data:
    true)
  3. index.num_of_shards = 4 & index.num_replicas = 1

Those are the only settings I've changed in the elasticsearch.yml file of
respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4
version. I indexed 4 million documents where each document has 10 english
sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as
I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other
processes are running on it apart from ES).

  1. "*" query - to get all docs (i.e. 4 Million hits)
  2. "*" query with size set to 100 - to get only top 100 hits
  3. a term query with size again set to 4 Million to retrieve all rows
  4. same term query with size set to 100 - to get only top 100 hits

It is encouraging to see the response times of queries 2,3 & 4 when
compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and
consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279
secs where as SolrCloud took 55 secs.

Why is it the "matchAll" query is ES is taking so much more time compared
to Solr ? Is the "matchAll" query a bottleneck for ES / known limitation
for ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

You should not get all docs on client side with default query.
Have a look at scan an scroll API.

Not sure if it will be faster though.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 sept. 2013 à 23:34, Phani Chaitanya Vempaty pvempaty.usc@gmail.com a écrit :

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search performance. Below is my cluster setup :

Master on separate machine (node.master: true, node.data : false)
4 data nodes on separate machines (node.master: false, node.data: true)
index.num_of_shards = 4 & index.num_replicas = 1
Those are the only settings I've changed in the elasticsearch.yml file of respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4 version. I indexed 4 million documents where each document has 10 english sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other processes are running on it apart from ES).
"" query - to get all docs (i.e. 4 Million hits)
"
" query with size set to 100 - to get only top 100 hits
a term query with size again set to 4 Million to retrieve all rows
same term query with size set to 100 - to get only top 100 hits
It is encouraging to see the response times of queries 2,3 & 4 when compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279 secs where as SolrCloud took 55 secs.

Why is it the "matchAll" query is ES is taking so much more time compared to Solr ? Is the "matchAll" query a bottleneck for ES / known limitation for ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(searchelastic) #3

We are also facing the same issue . Please look at below case,

We have migrated from solr(3.6) to es(0.20.5). We will create nearly 80
indexes per day with total size as ~300GB. Among the 80 indexes one index
size is 30 to 40GB(150 millions records ) , some indexes have 2 to 5GB. We
used 8 solr machines (4 indexer + 4 optimizer). Indexer job will create
indexes on each hour with merge factor as 1000 . Once the hour rotated the
previous hour index will be scp to optimizer machine, here we fully
optimize the hour index and merge it to the day index. With this current
setup we achieved best query performance other than one big index. Because
when we loading the big index most of the we are facing OOM. So we decided
to move es. In es too we have 8 datanodes and 2 master machines and 4
client nodes. We approximately know which indexes will have more size so we
decided to have no.of.shards/per index based on their size. We set 5 shards
with 1 replication for the big index alone , remaining have 1 shard + 1
replication.In solr when we query(simple match All or a field query) we get
results in less than a second for small indexes but in es it takes 3 to 4
secs.We used QUERY_AND_FETCH type for 1P+1R indexes and QUERY_THEN_FETCH
type for 5P+1R indexes.I have shared my configuration below , can any one
suggest why we are getting 3 secs in es for small indexes too ? and big
index will take 200 secs to 500 secs, how to reduce this ? FYI: We are
moving in 0.90.5.

In solr we set 4GB for optimizer machines , In es we set 8GB for all nodes
(master + data)

In elasticsearch.yml

index.refresh_interval: 30s

index.merge.policy.max_merge_at_once: 3 (Because we are not optimizing in
es. slow indexing will be acceptable.)

index.merge.policy.segments_per_tier: 3

indices.store.throttle.type: merge

indices.store.throttle.max_bytes_per_sec: 50mb

index.cache.field.type: soft

index.cache.field.max_size: 5000000

index.cache.field.expire: 15m

action.disable_delete_all_indices: true

index :

analysis :

  analyzer :

    default_index :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]    

    default_search :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]

On Friday, September 27, 2013 7:11:46 AM UTC+5:30, David Pilato wrote:

You should not get all docs on client side with default query.
Have a look at scan an scroll API.

Not sure if it will be faster though.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 sept. 2013 à 23:34, Phani Chaitanya Vempaty <pvempa...@gmail.com<javascript:>>
a écrit :

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search
performance. Below is my cluster setup :

  1. Master on separate machine (node.master: true, node.data : false)
  2. 4 data nodes on separate machines (node.master: false, node.data:
    true)
  3. index.num_of_shards = 4 & index.num_replicas = 1

Those are the only settings I've changed in the elasticsearch.yml file of
respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4
version. I indexed 4 million documents where each document has 10 english
sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as
I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other
processes are running on it apart from ES).

  1. "*" query - to get all docs (i.e. 4 Million hits)
  2. "*" query with size set to 100 - to get only top 100 hits
  3. a term query with size again set to 4 Million to retrieve all rows
  4. same term query with size set to 100 - to get only top 100 hits

It is encouraging to see the response times of queries 2,3 & 4 when
compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and
consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279
secs where as SolrCloud took 55 secs.

Why is it the "matchAll" query is ES is taking so much more time compared
to Solr ? Is the "matchAll" query a bottleneck for ES / known limitation
for ES.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(searchelastic) #4

On Friday, September 27, 2013 3:04:47 AM UTC+5:30, Phani Chaitanya wrote:

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search
performance. Below is my cluster setup :

  1. Master on separate machine (node.master: true, node.data : false)
  2. 4 data nodes on separate machines (node.master: false, node.data:
    true)
  3. index.num_of_shards = 4 & index.num_replicas = 1

Those are the only settings I've changed in the elasticsearch.yml file of
respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4
version. I indexed 4 million documents where each document has 10 english
sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as
I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other
processes are running on it apart from ES).

  1. "*" query - to get all docs (i.e. 4 Million hits)
  2. "*" query with size set to 100 - to get only top 100 hits
  3. a term query with size again set to 4 Million to retrieve all rows
  4. same term query with size set to 100 - to get only top 100 hits

It is encouraging to see the response times of queries 2,3 & 4 when
compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and
consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279
secs where as SolrCloud took 55 secs.

Why is it the "matchAll" query is ES is taking so much more time compared
to Solr ? Is the "matchAll" query a bottleneck for ES / known limitation
for ES.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(searchelastic) #5

We too facing the same issue , please look at our case below

We have migrated from solr(3.6) to es(0.20.5). We will create nearly 80
indexes per day with total size as ~300GB. Among the 80 indexes one index
size is 30 to 40GB(150 millions records ) , some indexes have 2 to 5GB. We
used 8 solr machines (4 indexer + 4 optimizer). Indexer job will create
indexes on each hour with merge factor as 1000 . Once the hour rotated the
previous hour index will be scp to optimizer machine, here we fully
optimize the hour index and merge it to the day index. With this current
setup we achieved best query performance other than one big index. Because
when we loading the big index most of the we are facing OOM. So we decided
to move es. In es too we have 8 datanodes and 2 master machines and 4
client nodes. We approximately know which indexes will have more size so we
decided to have no.of.shards/per index based on their size. We set 5 shards
with 1 replication for the big index alone , remaining have 1 shard + 1
replication.In solr when we query we get results in less than a second for
small indexes but in es it takes 3 to 4 secs.We used QUERY_AND_FETCH type
for 1P+1R indexes and QUERY_THEN_FETCH type for 5P+1R indexes.I have shared
my configuration below , can any one suggest why we are getting 3 secs in
es for small indexes too ? and big index will take 200 secs to 500 secs,
how to reduce this ? FYI: We are moving in 0.90.5.

In solr we set 4GB for optimizer machines , In es we set 8GB for all nodes
(master + data)

In elasticsearch.yml

index.refresh_interval: 30s

index.merge.policy.max_merge_at_once: 3 (Because we are not optimizing in
es. slow indexing will be acceptable.)

index.merge.policy.segments_per_tier: 3

indices.store.throttle.type: merge

indices.store.throttle.max_bytes_per_sec: 50mb

index.cache.field.type: soft

index.cache.field.max_size: 5000000

index.cache.field.expire: 15m

action.disable_delete_all_indices: true

index :

analysis :

  analyzer :

    default_index :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]    

    default_search :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]

On Friday, September 27, 2013 7:11:46 AM UTC+5:30, David Pilato wrote:

You should not get all docs on client side with default query.
Have a look at scan an scroll API.

Not sure if it will be faster though.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 sept. 2013 à 23:34, Phani Chaitanya Vempaty <pvempa...@gmail.com<javascript:>>
a écrit :

Hi all,

I'm experimenting with ES (version 0.90.5) for indexing/search
performance. Below is my cluster setup :

  1. Master on separate machine (node.master: true, node.data : false)
  2. 4 data nodes on separate machines (node.master: false, node.data:
    true)
  3. index.num_of_shards = 4 & index.num_replicas = 1

Those are the only settings I've changed in the elasticsearch.yml file of
respective nodes.

I'm comparing the performance with Solr (essentially SolrCloud) 4.4
version. I indexed 4 million documents where each document has 10 english
sentences (each sentence is about 10 words).

So each primary shard has approximately 1 million docs which is nice as
I've 4 million docs and 4 shards.

I'm running the following queries (I'm using a cluster and no other
processes are running on it apart from ES).

  1. "*" query - to get all docs (i.e. 4 Million hits)
  2. "*" query with size set to 100 - to get only top 100 hits
  3. a term query with size again set to 4 Million to retrieve all rows
  4. same term query with size set to 100 - to get only top 100 hits

It is encouraging to see the response times of queries 2,3 & 4 when
compared to Solr (SolrCloud) - ES is 2-3 times faster for 2 & 4 and
consistently faster for 3.

But for 1 - it takes hell lot of time compared to SolrCloud. ES took 279
secs where as SolrCloud took 55 secs.

Why is it the "matchAll" query is ES is taking so much more time compared
to Solr ? Is the "matchAll" query a bottleneck for ES / known limitation
for ES.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6