TopHits vs 2 queries

jonsgreen · November 9, 2016, 9:36pm

I am doing a basic request in which I doing a terms aggregation grouping by an id associated with another document which is actually nested within the documents themselves. To be more specific these are GNIP tweet results and I am grouping retweets by the original tweet document. At first I was doing this as two queries first to get the ids and the to get the original tweet documents. However this proved to be one of our least performant requests so I experimented with adding a top_hits aggregation instead since the information about the original tweet and author are container within the retweet document.

I was disappointed however to find that adding the top_hits aggregation ended up taking more time even though I only have size 1 and set the collect_mode to 'breadth_first' on the terms aggregation.

I am curious whether that is to be expected that the 2 simpler queries would take less time than 1 query with the the top_hits sub-aggregation.

If there is a better strategy for me to consider please let me know.

system · December 7, 2016, 9:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.