How to sort on small cluster with 100m+ documents without OOME

Hi,

I have to query my data on sort them by some field (datetime for example).

If I query the indexes with various criteria there is no problem until a
sorting is added. I was expected to have the filtered data sorted (matched
~1k of 100m+) but it seems the engine tries to load all the field values
into the memory which causes OOME.

The question is if there is an opportunity of sorting only the filtered
data only? (I do not store the document's source)

Do you have an experience in this case?

Really thanks for any advices.

Regards.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm somewhat pessimistic about sorting, from both a technical and a
user-experience standpoint:

  • Technically - if you're not caching masses of values in RAM (your current
    problem) you're having to hit disk randomly to retrieve these values for a
    large number of matching docs (which is slow).
  • From a user-experience point of view: a lot of what Lucene has to offer
    is in relevance-ranked ordering of partial matches i.e. it is designed to
    produce fuzzy sets of results where, unlike databases, membership of the
    set is measured to a degree. Sorting a fuzzy set, (or for that matter,
    producing facet counts) only helps surface the low-quality matches that may
    otherwise be lurking unseen at the long-tail end of that very large fuzzy
    set.

Sometimes, you can solve the user's problem a different way e.g. if you are
sorting most-recent-first because you want to let a user see current
content then consider the Google search approach instead - they offer users
a range of filters ("past hour", "past week" etc).
These are technically more efficient to implement and users benefit from
the natural relevancy ranking order keeping highest-quality matches first
and the long-tail of low-quality crud out of sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.