Improving custom_score query execution time


(Nitish Sharma) #1

Hi,
We are using Elasticsearch on our production search. So far, we have been
sorting all the results in chronological order. On an average it takes
300-400 ms to finish the query. The index is of size ~150 million documents
on 5 node ES cluster with 10 shards and 1 replica.
We are trying to move away from this time-based ordering. Custom_score
query comes handy here. But this query takes longer (around 4-5 seconds)
than desired behaviour. Currently, we are using MVEL script.
Any suggestions for improvement here? Would writing script in Java and
deploying it on ES nodes (rather than passing in query JSON every single
time) make a difference?
TIA.

Cheers
Nitish


(Clinton Gormley) #2

On Tue, 2012-07-17 at 05:42 -0700, Nitish Sharma wrote:

Hi,
We are using Elasticsearch on our production search. So far, we have
been sorting all the results in chronological order. On an average it
takes 300-400 ms to finish the query. The index is of size ~150
million documents on 5 node ES cluster with 10 shards and 1 replica.
We are trying to move away from this time-based ordering. Custom_score
query comes handy here. But this query takes longer (around 4-5
seconds) than desired behaviour. Currently, we are using MVEL script.
Any suggestions for improvement here? Would writing script in Java and
deploying it on ES nodes (rather than passing in query JSON every
single time) make a difference?

How are you trying to order your docs? You can usually do it without
scripting.

Also, for the time based sorting, you probably don't need to sort 150
million docs. If you add a filter for docs with timestamp greater than
$start_date, you can probably speed things up


(Nitish Sharma) #3

The "new" ordering depends on - freshness factor of a document (derived
from created_at field), and reachability (which depends on 2 numeric
fields). What are the possible ways to achieve it without scripting? One
requirement we have is it should be easy to switch between time-based
ordering and custom_score-based ordering.
We are not sorting on all 150 million docs. "sort" is used on the set of
results for a particular query. Moreover, there is no fixed $start_date
since the documents are continuously added to the index with _ttl of 30
days. On an average, index consists of 150 million docs. Am I
mis-understanding your suggestion here?

Cheers
Nitish

On Tuesday, July 17, 2012 3:04:11 PM UTC+2, Clinton Gormley wrote:

On Tue, 2012-07-17 at 05:42 -0700, Nitish Sharma wrote:

Hi,
We are using Elasticsearch on our production search. So far, we have
been sorting all the results in chronological order. On an average it
takes 300-400 ms to finish the query. The index is of size ~150
million documents on 5 node ES cluster with 10 shards and 1 replica.
We are trying to move away from this time-based ordering. Custom_score
query comes handy here. But this query takes longer (around 4-5
seconds) than desired behaviour. Currently, we are using MVEL script.
Any suggestions for improvement here? Would writing script in Java and
deploying it on ES nodes (rather than passing in query JSON every
single time) make a difference?

How are you trying to order your docs? You can usually do it without
scripting.

Also, for the time based sorting, you probably don't need to sort 150
million docs. If you add a filter for docs with timestamp greater than
$start_date, you can probably speed things up


(Clinton Gormley) #4

On Tue, 2012-07-17 at 08:05 -0700, Nitish Sharma wrote:

The "new" ordering depends on - freshness factor of a document
(derived from created_at field), and reachability (which depends on 2
numeric fields). What are the possible ways to achieve it without
scripting?

Can you give an actual example? Otherwise it is difficult to suggest
anything.

One requirement we have is it should be easy to switch between
time-based ordering and custom_score-based ordering.
We are not sorting on all 150 million docs. "sort" is used on the set
of results for a particular query. Moreover, there is no fixed
$start_date since the documents are continuously added to the index
with _ttl of 30 days. On an average, index consists of 150 million
docs. Am I mis-understanding your suggestion here?

Maybe, maybe not. Again, lack of details. Given your _ttl of 30 days,
perhaps my suggestion won't work. I was thinking more of a scenario
where you have (eg) an archive over 3 years, but you know that the
chances are good that the 10 results you actually need will all come
from the last 1 month, in which case you can add the "less than 1 month"
filter into your query to reduce the number of hits.

clint

Cheers
Nitish

On Tuesday, July 17, 2012 3:04:11 PM UTC+2, Clinton Gormley wrote:
On Tue, 2012-07-17 at 05:42 -0700, Nitish Sharma wrote:
> Hi,
> We are using Elasticsearch on our production search. So far,
we have
> been sorting all the results in chronological order. On an
average it
> takes 300-400 ms to finish the query. The index is of size
~150
> million documents on 5 node ES cluster with 10 shards and 1
replica.
> We are trying to move away from this time-based ordering.
Custom_score
> query comes handy here. But this query takes longer (around
4-5
> seconds) than desired behaviour. Currently, we are using
MVEL script.
> Any suggestions for improvement here? Would writing script
in Java and
> deploying it on ES nodes (rather than passing in query JSON
every
> single time) make a difference?

    How are you trying to order your docs?  You can usually do it
    without 
    scripting. 
    
    Also, for the time based sorting, you probably don't need to
    sort 150 
    million docs.  If you add a filter for docs with timestamp
    greater than 
    $start_date, you can probably speed things up 

(system) #5