We are indexing emails along with attachments. When we run searches, we use pagination to fetch 100 results per page.
By default we are sorting on score. It could happen that one of the hit attachments (A1) of an email (E1) comes in page 1 and another hit attachment (A2) is in page 50 due to its lower score.
But we are displaying the family together and only items that are hits. So, when we see A1, we need to show A1 and A2 together under E2 since both of them are hits.
We do not want to iterate through all the results since that will have search performance plus memory implication and also makes pagination approach useless.
Another approach we can take is that after getting our results, we get all attachments of E1 and run another query where the hits should be within these attachments. But this also doesn't seem to be an ideal approach.
Has anyone else seen or solved a similar problem or has some inputs on how to approach it? May be we can have something in our schema so that members of the same family are together like an order by in MySQL where first order by is on score and second order by on family id?
Thanks,
Sau