Show first record of each group

Hi guys,

I have been trying to find a way perform group by, and then obtaining the first of each group. This set should also be filterable and sort-able, and should also be able to count each group.

Wondering if what I am looking for can be done in Elasticsearch.

For example, given the following dataset:

{id:"1", group:"A", description:"abc", status:"COMPLETED"},
{id:"2", group:"A", description:"def", status:"PENDING"},
{id:"3",  group:"B", description:"ghi", status:"COMPLETED"},
{id:"4",  group:"B", description:"jkl", status:"COMPLETED"},
{id:"5", group:"C", description:"mno", status:"COMPLETED"}

Is there a way to form a query to obtain something along the lines of:

{
  ...
  "hits": 3,
  ...
  "_source": [
    {id:"1", group:"A", description:"abc", group_count: 2, group_has_pending: true},
    {id:"3", group:"B", description:"ghi", group_count: 2, group_has_pending: false},
    {id:"5", group:"C", description:"mno", group_count: 1, group_has_pending: false}
  ]
}

Searching for "def" should yield results along the line of:

{
  ...
  "hits": 1,
  ...
  "_source": [
    {id:"2", group:"A", description:"def", group_count: 2, group_has_pending: true},
  ]
}

I have tried the collapse query but still does not fully satisfy my requirement, i.e. cannot get the total count of documents in a group ignoring filter, and also can't seem to include the has_pending into the result set.

The closest example that is similar to what I want to achieve is like how the Gmail web application does it with their search:

(Note that their message count within each conversation group does not change even if filtering is on, and they managed to achieve pagination with the result set as well)

Anyone have any idea on how to achieve this?
Or does anyone implemented similar use cases like the Gmail example?

Have you considered using a transform to create a separate index that has one document per group with the data you require?

Elasticsearch does not change the source, so getting the data in the form you specified is as far as I know not possible. You may be able to get just the latest document per group but that could get expensive at scale and does not seem to match your requirement.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.