Grouping match_phrase search results by match text


(Cole Maclean) #1

Hi, I have a question regarding grouping results by match text.

Given a phrase match query like this:

{
    'match_phrase': {
        'text.english': {
            'query': "The fox jumped over the wall",
            'phrase_slop': 4,
        }
    }
}

Is there a way I can group results by the exact match?

So if I have 1 document with text.english containing "The quick fox jumps over the small wall" and 3 documents containing "The lazy fox jumped over the big wall", I end up with those two groups of results.

I'm OK with running multiple queries and doing some processing outside of ES, but I need a solution that performs reasonably over a large set of documents. Ideally I'm hoping there's a way to do this using aggregations that I've missed.

The best solution I've come up with is to run the query above with highlights, parse out all of the highlights from all of the results, and group them based on highlight content. This is fine for very small result sets, however over a 1000+ document result set it is prohibitively slow.

Thanks in advance for any suggestions!


(system) #2