How to achieve Query Performance


(Giri M) #1

Hi All,

We have migrated from solr(3.6) to es(0.20.5). We will create nearly 80
indexes per day with total size as ~300GB. Among the 80 indexes one index
size is 30 to 40GB(150 millions records ) , some indexes have 2 to 5GB. We
used 8 solr machines (4 indexer + 4 optimizer). Indexer job will create
indexes on each hour with merge factor as 1000 . Once the hour rotated the
previous hour index will be scp to optimizer machine, here we fully
optimize the hour index and merge it to the day index. With this current
setup we achieved best query performance other than one big index. Because
when we loading the big index most of the we are facing OOM. So we decided
to move es. In es too we have 8 datanodes and 2 master machines and 4
client nodes. We approximately know which indexes will have more size so we
decided to have no.of.shards/per index based on their size. We set 5 shards
with 1 replication for the big index alone , remaining have 1 shard + 1
replication.In solr when we query we get results in less than a second for
small indexes but in es it takes 3 to 4 secs.We used QUERY_AND_FETCH type
for 1P+1R indexes and QUERY_THEN_FETCH type for 5P+1R indexes.I have shared
my configuration below , can any one suggest why we are getting 3 secs in
es for small indexes too ? and big index will take 200 secs to 500 secs,
how to reduce this ? FYI: We are moving in 0.90.5.

In solr we set 4GB for optimizer machines , In es we set 8GB for all nodes
(master + data)

In elasticsearch.yml

index.refresh_interval: 30s

index.merge.policy.max_merge_at_once: 3 (Because we are not optimizing in
es. slow indexing will be acceptable.)

index.merge.policy.segments_per_tier: 3

indices.store.throttle.type: merge

indices.store.throttle.max_bytes_per_sec: 50mb

index.cache.field.type: soft

index.cache.field.max_size: 5000000

index.cache.field.expire: 15m

action.disable_delete_all_indices: true

index :

analysis :

  analyzer :

    default_index :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]    

    default_search :

        type : custom

        tokenizer : standard

        filter : [ word_delimiter, lowercase, snowball ]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Can you give an example of a query?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Giri M) #3

Same data only indexed in solr and es. FYI : We use indexer machines as es
clients. Each records in a file will be indexed in both solr and es.

All the queries are sorted by time_stamp field in desc order.
In Solr (takes around 40 milliseconds for both default and filed queries
on small indexes)

By default we use :

For field query takes around 40 milliseconds for small indexes

fieldname:

Example: level:SEVERE

*In Es *(takes 200 to 500 milliseconds for both default and filed queries
on small indexes)

By default we use MatchAllQuery

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

boolQueryBuilder.minimumNumberShouldMatch(1);

MatchAllQueryBuilder builder = QueryBuilders.matchAllQuery();

boolQueryBuilder.should(builder);

For Field query

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

boolQueryBuilder.minimumNumberShouldMatch(1);

MatchQueryBuilder builderM = QueryBuilders.matchQuery(field, value);

builderM.operator(MatchQueryBuilder.Operator.AND);

boolQueryBuilder.must(builderM);

Solr Conf

ES Conf

        "filename" : {"type" : "string", "omit_norms" : true, "store" : 

"yes", "index" : "no" },

        "doc_no" : {"type" : "integer", "store" : "yes","index" : "no" 

},

         "req_id" : {"type" : "string", "omit_norms" : true, "index" : 

"not_analyzed" },

         "app_ip" : {"type" : "ip", "precision_step": 0 },

           "account" : {"type" : "string", "omit_norms" : true, "index" 

: "not_analyzed"},

           "thread_id" : {"type" : "integer", "precision_step": 0 },

          "level" : {"type" : "string", "omit_norms" : true, "index" : 

"not_analyzed"},

         "class_name" : {"type" : "string", "omit_norms" : true, 

"index" : "analyzed"},

         "method" : {"type" : "string", "omit_norms" : true, "index" : "

analyzed"},

         "time_stamp" : {"type" : "long", "precision_step": 0 },

        "time_taken" : {"type" : "long", "precision_step": 0 },

           "message" : {"type" : "string", "omit_norms" : true, "index" 

: "analyzed"},

        "throwable" : {"type" : "string", "omit_norms" : true, "index" 

: "analyzed"}

On Wednesday, October 2, 2013 12:12:09 PM UTC+5:30, Jörg Prante wrote:

Can you give an example of a query?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

You should try filters instead of queries.
They are cached in most cases.

TermFilter will be appropriate in that case.

My 2 cents

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 oct. 2013 à 07:09, Giri M mgiri935@gmail.com a écrit :

Same data only indexed in solr and es. FYI : We use indexer machines as es clients. Each records in a file will be indexed in both solr and es.

All the queries are sorted by time_stamp field in desc order.

In Solr (takes around 40 milliseconds for both default and filed queries on small indexes)
By default we use :

For field query takes around 40 milliseconds for small indexes

fieldname:

Example: level:SEVERE

In Es (takes 200 to 500 milliseconds for both default and filed queries on small indexes)

By default we use MatchAllQuery

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

boolQueryBuilder.minimumNumberShouldMatch(1);

MatchAllQueryBuilder builder = QueryBuilders.matchAllQuery();

boolQueryBuilder.should(builder);

For Field query

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();

boolQueryBuilder.minimumNumberShouldMatch(1);

MatchQueryBuilder builderM = QueryBuilders.matchQuery(field, value);

builderM.operator(MatchQueryBuilder.Operator.AND);

boolQueryBuilder.must(builderM);

Solr Conf

ES Conf

        "filename" : {"type" : "string", "omit_norms" : true, "store" : "yes", "index" : "no" },

        "doc_no" : {"type" : "integer", "store" : "yes","index" : "no" },

        "req_id" : {"type" : "string", "omit_norms" : true, "index" : "not_analyzed" },

        "app_ip" : {"type" : "ip", "precision_step": 0 },


        "account" : {"type" : "string", "omit_norms" : true, "index" : "not_analyzed"},


        "thread_id" : {"type" : "integer", "precision_step": 0 },

         "level" : {"type" : "string", "omit_norms" : true, "index" : "not_analyzed"},

        "class_name" : {"type" : "string", "omit_norms" : true, "index" : "analyzed"},

        "method" : {"type" : "string", "omit_norms" : true, "index" : "analyzed"},

        "time_stamp" : {"type" : "long", "precision_step": 0 },

        "time_taken" : {"type" : "long", "precision_step": 0 },


        "message" : {"type" : "string", "omit_norms" : true, "index" : "analyzed"},

        "throwable" : {"type" : "string", "omit_norms" : true, "index" : "analyzed"}	         

On Wednesday, October 2, 2013 12:12:09 PM UTC+5:30, Jörg Prante wrote:

Can you give an example of a query?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #5

The sorting on timestamp is slow. Use scoring instead.

For matchAll query, you do not need a bool query with minimumShouldMatch.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Christian Th.) #6

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch for
the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #7

Are you using a single index with a single shard and no replicas? With
Solr, unless you have explicitly setup SolrCloud to use multiple
shards/replicas, you will be using a single index with a single shard and
no replicas. Elasticsearch on the other hand defaults to 5 shards and 1
replica. This makes a difference, so make sure you are testing identical
setups.

Also, what Joerg said, remove the outer boolean query, its not needed
unless you are going to combine multiple queries together (you are not in
your example). If sorting by the timestamp, there is no need for scoring
so the query so wrap everything in a constant score query as well. When
sorting, you also should wait until cache's are "warmed" before performing
your tests. Do this with index warmers or execute a few test queries
before you start benchmarks.

Thanks,
Matt Weber

On Thu, Oct 3, 2013 at 9:15 AM, Christian Th. chth.exensio@gmail.comwrote:

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch
for the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Giri M) #8

Hi All

@David I tried the following is code snippet for term query

SearchRequestBuilder searchBuilder = newSearchRequestBuilder(client);//Node Client

searchBuilder.setIndices(new String[]{indexName});

searchBuilder.setTypes(new String[]{String.valueOf(type)});

if (sortneeded) {

searchBuilder.addSort("time_stamp", SortOrder.DESC);

}

searchBuilder.addFields(new String[]{"doc_no", "filename"});

searchBuilder.setFrom(from);//1

searchBuilder.setSize(size);//100

searchBuilder.setExplain(false);

searchBuilder.setSearchType(SearchType.QUERY_AND_FETCH);

searchBuilder.setPreference("_primary");

BoolFilterBuilder bf = FilterBuilders.boolFilter();

FilterBuilder tfb = bf.should(FilterBuilders.termFilter(fieldname,
value));

searchBuilder.setFilter(fb);

long startMillis = System.currentTimeMillis();

SearchResponse response = (SearchResponse)
searchBuilder.execute().actionGet();

long endMillis = System.currentTimeMillis();

term query (status:200) executed continously in ES 4 times and below are
the responses

1.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=2,588 ms

2.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=128ms

3.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=98ms

4.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=144ms

In Solr we issued same query on same index ,

1.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=588 ms

2.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=44 ms

3.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=41 ms

4.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=46 ms

@Matt

We are using classic solr 3.6.2 with no replication, We backup our indices
in DFS. To avoid this we migrating to ES.

In es , the one big index has 5 shards with 1 replica and other indices
will have 1 shard with 1 replica.

Please any one help me to resolve the problem .

On Thursday, October 3, 2013 10:05:41 PM UTC+5:30, Matt Weber wrote:

Are you using a single index with a single shard and no replicas? With
Solr, unless you have explicitly setup SolrCloud to use multiple
shards/replicas, you will be using a single index with a single shard and
no replicas. Elasticsearch on the other hand defaults to 5 shards and 1
replica. This makes a difference, so make sure you are testing identical
setups.

Also, what Joerg said, remove the outer boolean query, its not needed
unless you are going to combine multiple queries together (you are not in
your example). If sorting by the timestamp, there is no need for scoring
so the query so wrap everything in a constant score query as well. When
sorting, you also should wait until cache's are "warmed" before performing
your tests. Do this with index warmers or execute a few test queries
before you start benchmarks.

Thanks,
Matt Weber

On Thu, Oct 3, 2013 at 9:15 AM, Christian Th. <chth.e...@gmail.com<javascript:>

wrote:

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch
for the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #9

You are not testing apple to apples as I explained in my original post.
Searching a single shard is going to be faster than searching 5 shards and
merging the results on a single machine. Plus, it sounds like you are
searching across 2 indices as well ("big" index and "small" index). For
the purpose of testing/comparisons you need to make sure you are testing
the exact same setups and in this case it means make sure you have 1 shard
and no replicas configured for your indices.

We have also provided you with suggestions on how to setup your queries in
a more performant manner (remove the unnecessary boolean, use constant
score query, etc). Make all these changes then let us know how it goes.

Thanks,
Matt Weber

On Fri, Oct 4, 2013 at 2:17 AM, Giri M mgiri935@gmail.com wrote:

Hi All

@David I tried the following is code snippet for term query

SearchRequestBuilder searchBuilder = newSearchRequestBuilder(client);//Node Client

searchBuilder.setIndices(new String[]{indexName});

searchBuilder.setTypes(new String[]{String.valueOf(type)});

if (sortneeded) {

searchBuilder.addSort("time_stamp", SortOrder.DESC);

}

searchBuilder.addFields(new String[]{"doc_no", "filename"});

searchBuilder.setFrom(from);//1

searchBuilder.setSize(size);//100

searchBuilder.setExplain(false);

searchBuilder.setSearchType(SearchType.QUERY_AND_FETCH);

searchBuilder.setPreference("_primary");

BoolFilterBuilder bf = FilterBuilders.boolFilter();

FilterBuilder tfb = bf.should(FilterBuilders.termFilter(fieldname,
value));

searchBuilder.setFilter(fb);

long startMillis = System.currentTimeMillis();

SearchResponse response = (SearchResponse)
searchBuilder.execute().actionGet();

long endMillis = System.currentTimeMillis();

term query (status:200) executed continously in ES 4 times and below are
the responses

1.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=2,588 ms

2.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=128ms

3.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=98ms

4.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=144ms

In Solr we issued same query on same index ,

1.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=588 ms

2.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=44 ms

3.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=41 ms

4.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=46 ms

@Matt

We are using classic solr 3.6.2 with no replication, We backup our indices
in DFS. To avoid this we migrating to ES.

In es , the one big index has 5 shards with 1 replica and other indices
will have 1 shard with 1 replica.

Please any one help me to resolve the problem .

On Thursday, October 3, 2013 10:05:41 PM UTC+5:30, Matt Weber wrote:

Are you using a single index with a single shard and no replicas? With
Solr, unless you have explicitly setup SolrCloud to use multiple
shards/replicas, you will be using a single index with a single shard and
no replicas. Elasticsearch on the other hand defaults to 5 shards and 1
replica. This makes a difference, so make sure you are testing identical
setups.

Also, what Joerg said, remove the outer boolean query, its not needed
unless you are going to combine multiple queries together (you are not in
your example). If sorting by the timestamp, there is no need for scoring
so the query so wrap everything in a constant score query as well. When
sorting, you also should wait until cache's are "warmed" before performing
your tests. Do this with index warmers or execute a few test queries
before you start benchmarks.

Thanks,
Matt Weber

On Thu, Oct 3, 2013 at 9:15 AM, Christian Th. chth.e...@gmail.comwrote:

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch
for the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anthony Campagna) #10

It's important to note that no matter what you are not going to get the
same performance if your test is 100% the same. The sacrifice of
scalability is always speed. You choose elasticsearch for it's scalability
(if you didn't then you should stick to solr) and the cost of that
scalability in a distributed system is speed.

On Friday, October 4, 2013 5:17:59 AM UTC-4, Giri M wrote:

Hi All

@David I tried the following is code snippet for term query

SearchRequestBuilder searchBuilder = newSearchRequestBuilder(client);//Node Client

searchBuilder.setIndices(new String[]{indexName});

searchBuilder.setTypes(new String[]{String.valueOf(type)});

if (sortneeded) {

searchBuilder.addSort("time_stamp", SortOrder.DESC);

}

searchBuilder.addFields(new String[]{"doc_no", "filename"});

searchBuilder.setFrom(from);//1

searchBuilder.setSize(size);//100

searchBuilder.setExplain(false);

searchBuilder.setSearchType(SearchType.QUERY_AND_FETCH);

searchBuilder.setPreference("_primary");

BoolFilterBuilder bf = FilterBuilders.boolFilter();

FilterBuilder tfb = bf.should(FilterBuilders.termFilter(fieldname,
value));

searchBuilder.setFilter(fb);

long startMillis = System.currentTimeMillis();

SearchResponse response = (SearchResponse)
searchBuilder.execute().actionGet();

long endMillis = System.currentTimeMillis();

term query (status:200) executed continously in ES 4 times and below are
the responses

1.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=2,588 ms

2.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=128ms

3.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=98ms

4.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=144ms

In Solr we issued same query on same index ,

1.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=588 ms

2.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=44 ms

3.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=41 ms

4.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=46 ms

@Matt

We are using classic solr 3.6.2 with no replication, We backup our indices
in DFS. To avoid this we migrating to ES.

In es , the one big index has 5 shards with 1 replica and other indices
will have 1 shard with 1 replica.

Please any one help me to resolve the problem .

On Thursday, October 3, 2013 10:05:41 PM UTC+5:30, Matt Weber wrote:

Are you using a single index with a single shard and no replicas? With
Solr, unless you have explicitly setup SolrCloud to use multiple
shards/replicas, you will be using a single index with a single shard and
no replicas. Elasticsearch on the other hand defaults to 5 shards and 1
replica. This makes a difference, so make sure you are testing identical
setups.

Also, what Joerg said, remove the outer boolean query, its not needed
unless you are going to combine multiple queries together (you are not in
your example). If sorting by the timestamp, there is no need for scoring
so the query so wrap everything in a constant score query as well. When
sorting, you also should wait until cache's are "warmed" before performing
your tests. Do this with index warmers or execute a few test queries
before you start benchmarks.

Thanks,
Matt Weber

On Thu, Oct 3, 2013 at 9:15 AM, Christian Th. chth.e...@gmail.comwrote:

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch
for the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #11

if you setup is roughly the same you should see roughly the same numbers.
Try to use 1 shared and 0 replicas as a started and make sure both install
have the same number of segments (best is to call optimize after you are
done indexing). Then go ahead and fire your search and make sure you don't
fetch different amount of data. In the best case you would only return IDs
or so for the benchmark. In a multi shard environment ES does 2 roundtrips
(1 for the query and a second to fetch results)

if you are using (QUERY_AND_FETCH) you are essentially loading
"number_of_shards * size" that is 400 docs more than solr does, even if you
just use 1ms per doc that is a lot of time. In your case size is 100
documents per request so you should better use QUERY_THEN_FETCH (which is
default) I'd also run the query in a loop and discard the first n queries
as warmups. It might also be useful to use the warmer API eventually but
for now that is not needed. If you wanna expand you benchmark to more
shards but then put them on more machines as well.

On Friday, October 4, 2013 10:28:25 PM UTC+2, Anthony Campagna wrote:

It's important to note that no matter what you are not going to get the
same performance if your test is 100% the same. The sacrifice of
scalability is always speed. You choose elasticsearch for it's scalability
(if you didn't then you should stick to solr) and the cost of that
scalability in a distributed system is speed.

On Friday, October 4, 2013 5:17:59 AM UTC-4, Giri M wrote:

Hi All

@David I tried the following is code snippet for term query

SearchRequestBuilder searchBuilder = newSearchRequestBuilder(client);//Node Client

searchBuilder.setIndices(new String[]{indexName});

searchBuilder.setTypes(new String[]{String.valueOf(type)});

if (sortneeded) {

searchBuilder.addSort("time_stamp", SortOrder.DESC);

}

searchBuilder.addFields(new String[]{"doc_no", "filename"});

searchBuilder.setFrom(from);//1

searchBuilder.setSize(size);//100

searchBuilder.setExplain(false);

searchBuilder.setSearchType(SearchType.QUERY_AND_FETCH);

searchBuilder.setPreference("_primary");

BoolFilterBuilder bf = FilterBuilders.boolFilter();

FilterBuilder tfb = bf.should(FilterBuilders.termFilter(fieldname,
value));

searchBuilder.setFilter(fb);

long startMillis = System.currentTimeMillis();

SearchResponse response = (SearchResponse)
searchBuilder.execute().actionGet();

long endMillis = System.currentTimeMillis();

term query (status:200) executed continously in ES 4 times and below are
the responses

1.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=2,588 ms

2.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=128ms

3.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=98ms

4.ES Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=144ms

In Solr we issued same query on same index ,

1.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=588 ms

2.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=44 ms

3.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=41 ms

4.Solr Response : IndexName=xx_1_2013_10_01 && NumFound=1,238,306 &&
TimeTaken=46 ms

@Matt

We are using classic solr 3.6.2 with no replication, We backup our
indices in DFS. To avoid this we migrating to ES.

In es , the one big index has 5 shards with 1 replica and other indices
will have 1 shard with 1 replica.

Please any one help me to resolve the problem .

On Thursday, October 3, 2013 10:05:41 PM UTC+5:30, Matt Weber wrote:

Are you using a single index with a single shard and no replicas? With
Solr, unless you have explicitly setup SolrCloud to use multiple
shards/replicas, you will be using a single index with a single shard and
no replicas. Elasticsearch on the other hand defaults to 5 shards and 1
replica. This makes a difference, so make sure you are testing identical
setups.

Also, what Joerg said, remove the outer boolean query, its not needed
unless you are going to combine multiple queries together (you are not in
your example). If sorting by the timestamp, there is no need for scoring
so the query so wrap everything in a constant score query as well. When
sorting, you also should wait until cache's are "warmed" before performing
your tests. Do this with index warmers or execute a few test queries
before you start benchmarks.

Thanks,
Matt Weber

On Thu, Oct 3, 2013 at 9:15 AM, Christian Th. chth.e...@gmail.comwrote:

Are you using an "AND" Operator in Solr and a "should" in Elasticsearch
for the "_all" query? This could make an impact.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #12