How to sort on small cluster with 100m+ documents without OOME

Marek_Skorek · April 15, 2013, 10:41am

Hi,

I have to query my data on sort them by some field (datetime for example).

If I query the indexes with various criteria there is no problem until a
sorting is added. I was expected to have the filtered data sorted (matched
~1k of 100m+) but it seems the engine tries to load all the field values
into the memory which causes OOME.

The question is if there is an opportunity of sorting only the filtered
data only? (I do not store the document's source)

Do you have an experience in this case?

Really thanks for any advices.

Regards.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mark_Harwood1 · April 16, 2013, 2:32pm

I'm somewhat pessimistic about sorting, from both a technical and a
user-experience standpoint:

Technically - if you're not caching masses of values in RAM (your current
problem) you're having to hit disk randomly to retrieve these values for a
large number of matching docs (which is slow).
From a user-experience point of view: a lot of what Lucene has to offer
is in relevance-ranked ordering of partial matches i.e. it is designed to
produce fuzzy sets of results where, unlike databases, membership of the
set is measured to a degree. Sorting a fuzzy set, (or for that matter,
producing facet counts) only helps surface the low-quality matches that may
otherwise be lurking unseen at the long-tail end of that very large fuzzy
set.

Sometimes, you can solve the user's problem a different way e.g. if you are
sorting most-recent-first because you want to let a user see current
content then consider the Google search approach instead - they offer users
a range of filters ("past hour", "past week" etc).
These are technically more efficient to implement and users benefit from
the natural relevancy ranking order keeping highest-quality matches first
and the long-tail of low-quality crud out of sight.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Sort before filter? Elasticsearch	5	3139	July 6, 2017
Performance hit due to sort over date field Elasticsearch	5	3113	June 22, 2019
Sorting date fields Elasticsearch	5	323	July 6, 2017
Query performance Elasticsearch	1	300	July 6, 2017
Shuffle the index sorting for match_all Elasticsearch	3	1112	July 6, 2017

How to sort on small cluster with 100m+ documents without OOME

Related topics