Match only N queries in a large boolean query for a single result

I am constructing a query that looks like that:

Bool(should=[Bool(_name='F002690_sf3789834989482_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S338492778_HGSC', 'S59=89483927_HGSC']), Terms(samples_num_alt_2=['S384938978_HGSC', 'S545840927_HGSC']), Terms(samples=['S9043902778_HJKC', 'S5940394027_HJKC'])])]), Bool(_name='F002706_sf67890290_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S3904340491_HGSC', 'S3950490_HGSC']), Terms(samples_num_alt_2=['S35904991_HGSC', 'S39504950_HGSC']), Terms(samples=['S3905401_HGSC', 'S35004900_HGSC'])])]), Bool(_name='F002819_sf5994095721_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S20495049_HGSC', 'S549905499_HGSC', 'S57190549_HGSC', 'S710905949_HGSC']), ...

A very long one. And I need to somehow return results where there are more than N number of _name is present (matched_queries). I already posted a question that was answered:

So, I tried to use 'minimum_should_match' parameter like so (showing only the end of this large query, see minimum_should_match at the end of it):

... Bool(_name='F007276_sf3014344550_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S1643442_HGSC']), Terms(samples_num_alt_2=['S1605452_HGSC']), Terms(samples=['S1545142_HGSC'])])]), Bool(_name='F007254_sf545435_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S2600119_HGSC']), Terms(samples_num_alt_2=['S2605459_HGSC']), Terms(samples=['S2545119_HGSC'])])]), Bool(_name='F007254_sf46454547_wgs_hgsc', must=[Bool(should=[Terms(samples_num_alt_1=['S45854555_HGSC']), Terms(samples_num_alt_2=['S44545155_HGSC']), Terms(samples=['S4545155_HGSC'])])])])], minimum_should_match=10)

and it runs fine but it does not produce the desired results of limiting the number of matched _name queries. Here is how it is constructed in elasticsearch-dsl:

for family_samples_q in qs:
  genotypes_q |= family_samples_q

Search().query('bool', filter=[genotypes_q], minimum_should_match=10)

I suspect that it does not produce correct filtering because minimum_should_match is applied to the outer scope and not to inner one (not to genotypes_q, but to filter itself) but I am not sure how to change that if thats the case.

Basically what I need is to return only results that contain N (or more) number of _name queries matched. Single results, so I am not interested in the whole query matching N number results, but each single result matching N number of them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.