Filter by count of inner_hits?

I'm using ES to search movements of baroque music, so someone can find e.g. music for flute, violin, and soprano or for 2 violins and soprano. The following works:

  1. Multilingual instrument names flûte à bec should find music for recorder
  2. Generic instrument names violin should find music for viola d'amore but not vice versa
  3. Meta instruments "violin" should find music for "dessus" (high instrument), but rank it lower
  4. Boolean instruments "violin AND flute" should find music for both of those, but shouldn't find something for "violin XOR flute" I solved this with nested docs:
{
  "movement_title": "...",
  "instrumentations": [
    {
      "meta_instrument": "dessus",
      "instrument_role_options": [
        {
          "instrument_name": "violon",
          "instrument_name_generic": "violin"
        },
        {
          "instrument_name": "flûte allemande",
          "instrument_name_generic": "flute"
        }
       ]
    },
    {
      "meta_instrument": null,
      "instrument_role_options": [
        {
          "instrument_name": "soprano",
          "instrument_name_generic": "soprano"
         }
      ]
    }
  ]
}

However, I'm stuck on these:

  1. Multiple instruments violin and flute should find anything for violin and flute, plus anything with "2 dessus" (2 high instruments) but with range OK for violin and flute.
  2. Multiple instrument ranges playable on 2 flutes should find music for 2 violins that is within the flute's range.

I considered storing counts like

    { "instrument": "flute", "count": 2 },
    { "instrument": "violin", "count": 1 },
    { "instrument": "dessus", "count": 3 },

But that won't work for #5 or even #3.

Can anyone think of a way to structure this better so that I can filter by counts of nested items, like "2 flutes"?

Otherwise, is it possible to write a script filter that operates on analyzed text fields? (It looks like there are no doc_values on analyzed text.) I saw that Painless even has a Dictionary type that could help me count everything necessary all at once, but I thought there must be a better way…

Thanks for any ideas!

Ben

It seems like the best option is to filter by inner_hits on the client side — that way ES can handle the complex synonym logic. But then I can't get a count of total results, and I'll run into performance problems as the dataset grows. And I don't think scripts can access inner_hits?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.