Sorting docs as input to facet phase

Tikitu_de_Jager · September 3, 2013, 1:31pm

Hi folks,

I'm building a custom facet that would benefit greatly if I could feed it
its documents in a predefined order: the space requirements are much smaller
if I can guarantee that all documents that share the same value on a
particular field pass through the facet collector in one bunch.

I.e., this ordering is cheap (grouping by "tweet"):

{"tweet": 3, "label": 5}
{"tweet": 3, "label": 7}
{"tweet": 4, "label": 3}
{"tweet": 4: "label": 5}

but this is expensive:

{"tweet": 3, "label": 5}
{"tweet": 4, "label": 3}
{"tweet": 3, "label": 7}
{"tweet": 4: "label": 5}

(I'm only interested in aggregate statistics across all "tweet" values, but
I can't calculate the per-tweet value until I'm sure no more labels are
coming -- the actual case is somewhat more complicated, and involves some
timestamp calculations as well, but I think that's irrelevant.)

Is there any way to achieve this? I'm thinking maybe using nested documents
("tweet" is actually a parent-doc ID, but I was hoping to use parent/child
docs to avoid the reindexing requirement).

Regards,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tikitu_de_Jager · September 7, 2013, 10:13am

In case anyone finds this in the archives: as far as I can see this is
basically not possible at all with parent/child docs (at least without
reimplementing sorting yourself).

Nested docs, on the other hand, guarantee that the parent and children are
adjacent in the same segment; it's quite easy to make a custom Collector
which hands off both the parent and child docIds to (specialised versions
of) doCollect():

/**
 * Modified from

org.elasticsearch.index.search.nested.NestedChildrenCollector.java
*
* Collect root doc first, then all nested docs; send them to different
methods.
*/
@Override
public void collect(int parentDoc) throws IOException {
if (parentDoc == 0 || parentDocs == null) {
return;
}
doCollectParent(parentDoc);
int prevParentDoc = parentDocs.prevSetBit(parentDoc - 1);
for (int i = (parentDoc - 1); i > prevParentDoc; i--) {
if (!currentReader.isDeleted(i) && childDocs.get(i)) {
doCollectChild(i);
}
}
}

On Tuesday, 3 September 2013 16:31:39 UTC+3, Tikitu de Jager wrote:

Hi folks,

I'm building a custom facet that would benefit greatly if I could feed it
its documents in a predefined order: the space requirements are much smaller
if I can guarantee that all documents that share the same value on a
particular field pass through the facet collector in one bunch.

I.e., this ordering is cheap (grouping by "tweet"):
{"tweet": 3, "label": 5}
{"tweet": 3, "label": 7}
{"tweet": 4, "label": 3}
{"tweet": 4: "label": 5}
but this is expensive:
{"tweet": 3, "label": 5}
{"tweet": 4, "label": 3}
{"tweet": 3, "label": 7}
{"tweet": 4: "label": 5}
(I'm only interested in aggregate statistics across all "tweet" values,
but I can't calculate the per-tweet value until I'm sure no more labels are
coming -- the actual case is somewhat more complicated, and involves some
timestamp calculations as well, but I think that's irrelevant.)

Is there any way to achieve this? I'm thinking maybe using nested
documents ("tweet" is actually a parent-doc ID, but I was hoping to use
parent/child docs to avoid the reindexing requirement).

Regards,
Tikitu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Sort based on statistical facet values Elasticsearch	2	335	July 6, 2017
Our approach to make custom facet's order Elasticsearch	1	350	July 6, 2017
How to collect docs in order in collect() method of a custom aggregator Elasticsearch	2	357	July 6, 2017
Sorting facets results Elasticsearch	2	277	July 6, 2017
ElasticSearch: Order top-level aggregation buckets based on reverse_nested doc_count Elasticsearch	2	283	August 6, 2021

Sorting docs as input to facet phase

Related topics