Hi, I have a collection of articles with different author, title and revisions. Can I run a search to get all the articles with the biggest revision in their author+title group (records in bold)?
Author | PublishedDate | Revision | Title
---------------+------------------------+---------------+---------------
James |2019-02-04T00:00:00.000Z|1 |I wonder why James |2019-03-04T00:00:00.000Z|2 |I wonder why Parker |2019-03-04T00:00:00.000Z|1 |The Endgame
I tried terms+max aggregation with top_hits. The returned top hits record is not really with the max revision (=2):
The max and top_hits are two independent summaries - the max calculation has no influence on the top hits.
You need to use the sort feature in the top_hits aggregation to get the highest revision.
Sounds like you're wanting another level of terms aggregation underneath title which is a grouping for the reviewer. So you should have a hierarchy of
terms - author
terms - title
terms - reviewer
top_hits - size 1, sort by date descending
If this ends up being a lot of data for one request you might want to consider using the composite aggregation and using the after param to break it into multiple requests.
Are you sorting in the right direction?
Is revision=1 the highest recorded revision for a given author/title/reviewer or are you saying a revision=2 record is missing for one of these combos?
Ah OK.
I misunderstood the requirement. I assumed you wanted the last comment from each reviewer.
I now see you want comments from all reviewers on the last revision.
Exactly what I needed!
I was evaluating whether ES can do this for the new project, if not the team will go with relational database. But now ES is more likely to be chosen!
The learning curve of building ES queries is probably the 'cons' compared to RDBMS (Hopefully ES SQL support will mature soon). But this is addressed by the professionalism and swift responses of the community. Thank you very much!
For search results elasticsearch normally returns the top 10 matching documents plus any aggregations (think of your typical e-commerce search results with top 10 matching products and summaries of options for refining by price/colour/brand).
In your scenario you only want the aggregations and can dispense with the top-matching documents (so, size=0).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.