Aggregation queries question

Hello,

I have roughly 68 million documents across four indices that have identical mappings, for example:
{
'log_time': '2017-01-06T09:23:00Z',
'extra': {},
'used': False,
'who': u'Bruno',
'interaction_id': u'Fifty-five',
'what': u'message',
'value': u'=== Closed Without Comment ==='
}

I am looking to return a group of interaction_ids where each group has at least one of the following filters are true: At least one document where 'what' is 'Status' and the value is 'New',
At least one document where 'what' is 'Status' and the value is 'Closed' and at least one document where 'what' is 'Queue' and 'value' is 'Breakfix'.

I have tried combinations of a nested filtering aggregation, straight up pipeline bucket aggregation, bucketing by the filter conditions and then attempting to a child aggregation where it buckets by interaction_id all to no avail.

Would anyone have any suggestions what I might try next?

With Thanks!

How many unique interaction_ids are there?

About 3 million

Sorry replied to the thread and not to you: Its roughly 3 million unique interaction_ids

Joins on high cardinality fields are problematic for any distributed system. Here’s why and shows a way to work round it: https://youtu.be/yBf7oeJKH2Y

That's a great resource. Thanks Mark!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.