Subset match query

Hi everyone,

Thanks for reading.

Here's my problem. I have the following documents:

"course #1": {
    "requirements": [
        {
            "name": "AWS EC2 certification",
            "type": "certification"
        },
        {
            "name": "Python Advanced",
            "type": "certification"
        },
        {
            "name": "Critical Thinking",
            "type": "skill",
        }
}

"course #2": {
    "requirements": [
        {
            "name": "Ruby on Rails advanced",
            "type": "certification"
        },
        {
            "name": "Critical Thinking",
            "type": "skill",
        }
}

I want the users to find the courses where they match all the requirements, for example:

The user that has "Critical Thinking", "Ruby on Rails advanced" and "Python Advanced", should get only the course #2. In other words, I want to find the courses where the requirements are a subset of the user skills.

I saw posts related to this issue but are very old and I'm hoping to find a clean solution.

Thanks. I hope you have some inputs.

Start simple with terms and terms_set query. After that you can add a nested layer on top.

Example, let's map requirements as simple keyword field:

PUT test_index
{
  "mappings": {
    "properties": {
      "requirements": {
        "type": "keyword"
      }
    }
  }
}

Indexing some docs:

POST test_index/_bulk
{ "index" : { "_id" : "course1" } }
{"requirements": ["A", "B", "C"]}
{ "index" : { "_id" : "course2" } }
{"requirements": ["D", "C"]}

The query below returns you only courses that satisfy all your requirements.

GET test_index/_search
{
 "query": {
   "terms_set" : {
     "requirements" : {
       "terms" : ["C", "D"],
       "minimum_should_match_script": {
          "source": "params.num_terms"
        }
     }
   }
 }
}

Hi Mayya, thanks for the quick response.
The solution you posted is working fine, but when I tried to add a nested layer on top, it didn't work. So I was wondering if the terms_set works with nested objects, I looked this post but it has no answers Terms_set within a nested query, and what is happening to me is exactly that. When I used a nested query, there are no results in the response.

It looks like this:

{
        nested: {
          path: 'requirements',
          query: {
            terms_set: {
              'requirements.name': {
                terms: ["Python Advanced", "Critical Thinking"],
                minimum_should_match_script: {
                  source: "params.num_terms"
                },
              },
            }
          }
        }
}

And it works when I copy/paste the one you posted

I've realized that I have provided a wrong solution with terms_set query, as it would not returned subset, but rather a superset; that is you would not be able to find the courses where the requirements are a subset of the users skills, but rather find the courses where the user skills are the subset of course requirements.

Again, disregarding a nested part, to find a correct subset, you would need to writer a custom script that would compare if each course requirement is within user skills, something like this:

GET test_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": """
            for (int i = 0; i < doc['requirements'].length; i++) {
              boolean found = false;
              for(int j = 0; j < params.user_skills.length; j++) { 
                if (doc['requirements'][i] == params.user_skills[j]) {
                  found = true;
                  break;
                }
              }  
              if (found == false) return false;
            }
            return true;
            """,
            "params": {
              "user_skills": ["C", "D"]
            }
          }
        }
      }
    }
  }
}

You can't use this for nested objects, as each nested object is a separate document. Some possible solutions I can see are:

  1. Don't use nested, and just use "requirements" as a keyword field with multiple values (the way I did in my example)
  2. You can copy nested values in a root/parent object with include_in_parent and include_in_root options, and run queries on a corresponding root field.

May be there are some other clever ways to do it for nested objects, but nothing comes to my mind right now

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.