Getting correct counts after aggregation?


(James) #1

I am using a filter query and want to return a user-defined number of results sorted by date desc. I am able to do this by setting the size on the query and sorting. Everything looks good. HOWEVER, I need the results to be deduped and the count to be the number of DEDUPED documents.

When I try to dedup (by adding an aggregator with my primaryid and a top-hits sub aggregator with size=1) the hit results are still fine (albeit not deduped). However, the documents in each bucket of my aggregation are not the deduped version my my results (which is what I'm going after). They seem to be collected from a totally different set of hits.

Two questions:

  1. What is the correct way to dedup by result hits with the correct hits/sorting?

  2. Is it possible for ES to return me a certain number of documents AFTER deduping?

    SearchRequestBuilder srb = client.prepareSearch()
    .setIndices("indexA")
    .setScroll(new TimeValue(60000))
    .setQuery(builder)
    .addAggregation(AggregationBuilders.terms("docs")
    .field("dedupId")
    .subAggregation(AggregationBuilders
    .topHits("top_hits")
    .setSize(1)
    .addSort("orderDTS",sortOrder.DESC)))
    .addSort("orderDTS", SortOrder.DESC)
    .setFrom(0)
    .setSize(maxNumResults);


(system) #2