On Tue, 2012-07-17 at 08:05 -0700, Nitish Sharma wrote:
The "new" ordering depends on - freshness factor of a document
(derived from created_at field), and reachability (which depends on 2
numeric fields). What are the possible ways to achieve it without
scripting?
Can you give an actual example? Otherwise it is difficult to suggest
anything.
One requirement we have is it should be easy to switch between
time-based ordering and custom_score-based ordering.
We are not sorting on all 150 million docs. "sort" is used on the set
of results for a particular query. Moreover, there is no fixed
$start_date since the documents are continuously added to the index
with _ttl of 30 days. On an average, index consists of 150 million
docs. Am I mis-understanding your suggestion here?
Maybe, maybe not. Again, lack of details. Given your _ttl of 30 days,
perhaps my suggestion won't work. I was thinking more of a scenario
where you have (eg) an archive over 3 years, but you know that the
chances are good that the 10 results you actually need will all come
from the last 1 month, in which case you can add the "less than 1 month"
filter into your query to reduce the number of hits.
clint
Cheers
Nitish
On Tuesday, July 17, 2012 3:04:11 PM UTC+2, Clinton Gormley wrote:
On Tue, 2012-07-17 at 05:42 -0700, Nitish Sharma wrote:
> Hi,
> We are using Elasticsearch on our production search. So far,
we have
> been sorting all the results in chronological order. On an
average it
> takes 300-400 ms to finish the query. The index is of size
~150
> million documents on 5 node ES cluster with 10 shards and 1
replica.
> We are trying to move away from this time-based ordering.
Custom_score
> query comes handy here. But this query takes longer (around
4-5
> seconds) than desired behaviour. Currently, we are using
MVEL script.
> Any suggestions for improvement here? Would writing script
in Java and
> deploying it on ES nodes (rather than passing in query JSON
every
> single time) make a difference?
How are you trying to order your docs? You can usually do it
without
scripting.
Also, for the time based sorting, you probably don't need to
sort 150
million docs. If you add a filter for docs with timestamp
greater than
$start_date, you can probably speed things up