Filter by count of inner_hits?

bkazez · April 26, 2021, 8:08pm

I'm using ES to search movements of baroque music, so someone can find e.g. music for flute, violin, and soprano or for 2 violins and soprano. The following works:

Multilingual instrument names flûte à bec should find music for recorder
Generic instrument names violin should find music for viola d'amore but not vice versa
Meta instruments "violin" should find music for "dessus" (high instrument), but rank it lower
Boolean instruments "violin AND flute" should find music for both of those, but shouldn't find something for "violin XOR flute" I solved this with nested docs:

{
  "movement_title": "...",
  "instrumentations": [
    {
      "meta_instrument": "dessus",
      "instrument_role_options": [
        {
          "instrument_name": "violon",
          "instrument_name_generic": "violin"
        },
        {
          "instrument_name": "flûte allemande",
          "instrument_name_generic": "flute"
        }
       ]
    },
    {
      "meta_instrument": null,
      "instrument_role_options": [
        {
          "instrument_name": "soprano",
          "instrument_name_generic": "soprano"
         }
      ]
    }
  ]
}

However, I'm stuck on these:

Multiple instruments violin and flute should find anything for violin and flute, plus anything with "2 dessus" (2 high instruments) but with range OK for violin and flute.
Multiple instrument ranges playable on 2 flutes should find music for 2 violins that is within the flute's range.

I considered storing counts like

    { "instrument": "flute", "count": 2 },
    { "instrument": "violin", "count": 1 },
    { "instrument": "dessus", "count": 3 },

But that won't work for #5 or even #3.

Can anyone think of a way to structure this better so that I can filter by counts of nested items, like "2 flutes"?

Otherwise, is it possible to write a script filter that operates on analyzed text fields? (It looks like there are no doc_values on analyzed text.) I saw that Painless even has a Dictionary type that could help me count everything necessary all at once, but I thought there must be a better way…

Thanks for any ideas!

Ben

bkazez · April 27, 2021, 7:16pm

It seems like the best option is to filter by inner_hits on the client side — that way ES can handle the complex synonym logic. But then I can't get a count of total results, and I'll run into performance problems as the dataset grows. And I don't think scripts can access inner_hits?

system · May 25, 2021, 7:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filtering nested objects using inner hits total Elasticsearch	3	2598	July 5, 2017
Filtering with nested query inner_hits count Elasticsearch	1	181	July 3, 2023
Search query total hits not adding up when using multiple queries Elasticsearch	10	607	August 25, 2020
Multiple filters on nested term Elasticsearch	1	582	July 5, 2017
Count API with multiple fields in Java Elasticsearch	6	1389	July 6, 2017

Filter by count of inner_hits?

Related topics