Filtering nested objects

Hi, I'm trying to build a system that will have a user's phone book in nested field and this will have user_id and user_gender. I am trying to filter over gender in search result but i'm only getting the whole nested object in response of my query.
I'm wondering if there exists any mechanism that allows you to filter the array of nested objects and output specific fields
I'm using Elasticsearch 7.16
Below is the sample data for one user

"_source" : {
          "user_id" : 1,
          "user_gender" : 1,
          "user_location" : {
            "lat" : 33.765,
            "lon" : 74.444
          },
          "phone_book" : [
            {
              "user_id" : 2,
              "user_gender" : 1,
              "user_location" : {
                "lat" : 33.665,
                "lon" : 74.444
              }
            },
            {
              "user_id" : 3,
              "user_gender" : 0,
              "user_location" : {
                "lat" : 33.755,
                "lon" : 74.444
              }
            },
            {
              "user_id" : 4,
              "user_gender" : 0,
              "user_location" : {
                "lat" : 33.755,
                "lon" : 74.444
              }
            },
            {
              "user_id" : 5,
              "user_gender" : 1,
              "user_location" : {
                "lat" : 33.755,
                "lon" : 74.434
              }
            },
            {
              "user_id" : 6,
              "user_gender" : 0,
              "user_location" : {
                "lat" : 33.655,
                "lon" : 74.414
              }
            }
          ],
          "user_relations" : {
            "name" : "user"
          }
        }
1 Like

Did you find a solution to your problem? I have a similar issue, and am curious to hear if and how you managed to solve it?

Hi Michael,
No i did not found any solution to filter over nested data type. I guess Elasticsearch doesn't support this thing as it return us the whole document or specific fields that we mention in the _source list. As nested type is a single field and all of the objects inside it are associated with that particular document and when we require this field in the resulting document it get all of the nested objects present in it.

what i did was that i changed the mapping of the phone_book from nested to join with reverse relation i.e both when a user will save a contact that is on our platform, the system will index two documents one having the info of that user say B whose contact is just saved by former say A (parent A) and other child document will have information of user A and this will have its parent B. so searching and filtering on child documents will be easy.

you may not need this reverse relation thing.
if you find some way to achieve filtering in nested objects do share with me as well -_- :slight_smile:

Hi @dev_test and @maph

I'm not sure what "output specific fields" mean. Is it "output specific nested object"??
If so, you can filter nested objects by nested query and inner_hits option.

If you mean filtering field of nested objects, you can use _source option for inner_hits.

{
  "query":{
    "nested":{
      "path": "phone_book",
      "query":{
        "term":{
          "phone_book.user_gender":{
            "value": 1
          }
        }
      },
      "inner_hits": {
        "_source":["phone_book.user_id", "phone_book.user_gender"]
      }
    }
  },
  "_source": ["user_id", "user_gender"]
}

Thanks,

Hi @Tomo_M and @dev_test,

Thank you for both of your replies, and also happy new year to you.

I'm not able to change the mappings of the index, as it's a public data source which I'm accessing but have no control over. But glad you found a solution @dev_test !

@Tomo_M, thanks for that suggestion. I've tried using inner_hits in a nested query as well, without success. Perhaps it's because I'm a novice at Elasticsearch, but I am looking to query a particular ID with a boolean match query, and then return several different data objects, including certain objects within a nested object. But I'm not interested in also having to query the nested object itself, I just want to filter it's response so I only get some of what it contains back. Does that make sense?

I can try to post an example, if necessary? Thanks for your time!

1 Like

How about this? The result seems to match your interest. I used boolean "should" clause and minimum_should_match = 0, because to return documents without nested object matching the filter.

GET test_nested_field/_search
{
  "query":{
    "bool":{
      "filter":[{
        "term":{
          "user_id": "1"
        }
      }],
      "should":[{
        "nested":{
          "path": "phone_book",
          "query":{
            "term":{
              "phone_book.user_gender":{
                "value": 1
              }
            }
          },
          "inner_hits": {
            "_source":["phone_book.user_id", "phone_book.user_gender"]
          }
        }
      }
      ],
      "minimum_should_match": 0
      }
  },
  "_source": ["user_id", "user_gender"]
}
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_nested_field",
        "_type" : "_doc",
        "_id" : "1Lv7L34Bf0nakUP8DwBZ",
        "_score" : 1.0,
        "_source" : {
          "user_id" : 1,
          "user_gender" : 1
        },
        "inner_hits" : {
          "phone_book" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "test_nested_field",
                  "_type" : "_doc",
                  "_id" : "1Lv7L34Bf0nakUP8DwBZ",
                  "_nested" : {
                    "field" : "phone_book",
                    "offset" : 0
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "user_id" : 2,
                    "user_gender" : 1
                  }
                },
                {
                  "_index" : "test_nested_field",
                  "_type" : "_doc",
                  "_id" : "1Lv7L34Bf0nakUP8DwBZ",
                  "_nested" : {
                    "field" : "phone_book",
                    "offset" : 3
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "user_id" : 5,
                    "user_gender" : 1
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

If you try but without success, paste the query and return (and your desired output), then someone will help you!

Thanks for trying, Tomo!

I tried adapting the query to the index and mapping I'm looking at, but couldn't get it to work.

The nested object has the following mapping:

"attributter": {                 
             "type": "nested",                 
             "include_in_parent": true,                 
             "properties": {                   
                 "sekvensnr": {                     
                   "type": "long"                   
              },                   
              "type": {                     
              "type": "string"                   
              },                   
              "vaerdier": {                     
              "type": "nested",                     
              "include_in_parent": true,                     
              "properties": {                       
                   "periode": {                        
                   "properties": {                           
                        "gyldigFra": {                             
                        "type": "date",                             
                    "format": "dateOptionalTime"                           
                     },                           
                     "gyldigTil": {                             
                          "type": "date",                             
                          "format": "dateOptionalTime"                           
                       }                         
                     }                       
                   },                       
                   "sidstOpdateret": {                         
                   "type": "date",                         
                   "format": "dateOptionalTime"                       
                    },                       
                   "vaerdi": {                         
                   "type": "string"                       
                    }                     
                  }                   
                },                   
               "vaerditype": {                     
               "type": "string"                  
              }                 
            }               
         },

I'm sending the following request, quering for a term elsewhere in the index:

  http://distribution.virk.dk/cvr-permanent/virksomhed/_search -d '
  {
    "query":{
      "bool":{
        "filter":[{
          "term":{
            "Vrvirksomhed.cvrNummer": "35128417"
          }
        }],
        "should":[{
          "nested":{
            "path": "Vrvirksomhed.attributter",
            "query":{
              "term":{
                "Vrvirksomhed.attributter.KAPITAL":{
                  "value": 1
                }
              }
            },
            "inner_hits": {
              "_source":["Vrvirksomhed.attributter.KAPITAL"]
            }
          }
        }
        ],
        "minimum_should_match": 0
        }
    },
    "_source": ["Vrvirksomhed.attributter.KAPITAL"]
  }' | python -mjson.tool

But it returns an empty 'succesful' result:

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.0,
        "hits": [
            {
                "_index": "cvr-v-20200115",
                "_type": "_doc",
                "_id": "4000333365",
                "_score": 0.0,
                "_source": {},
                "inner_hits": {
                    "Vrvirksomhed.attributter": {
                        "hits": {
                            "total": 0,
                            "max_score": null,
                            "hits": []
                        }
                    }
                }
            }
        ]
    }
}

And if I change the path, it gives me an error. But I guess it's the path that I'm somehow getting wrong?

I cant find KAPITAL field in attributter.

Yeah, the documentation is problematic because it says that the list of attributes has four areas:

  • DIVERSE
  • KAPITALFORHOLD
  • KATEGORISERING
  • REGNSKABSPERIODE
  • TEKSTER
  • Udenlandske revisionsvirksomheder
  • HvidVask

Each of these areas then have a number of attributter, where KAPITAL should be under KAPITALFORHOLD:

For KAPITALFORHOLD findes følgende attributter:

  • BØRSNOTERET
  • KAPITAL
  • KAPITAL_DELVIST
  • KAPITALKLASSE
  • KAPITALVALUTA

Yet if I try

Vrvirksomhed.attributter.KAPITALFORHOLD.KAPITAL

I get an error message...

Maybe there is another problem with nested query (Not just selecting the output, but querying itself). It is something different from the original topic here.

I recommend you to post a new topic. Since the Russian field names are difficult to understand and even impossible to type for the most users including me, I suggest building a simpler example that reproduces the problem and share the whole mappings and the query.

Yeah, I suspected it might be something with the way it's nested.

I'll look to post a new topic, with a more clear description. It's Danish, not Russian, but that doesn't help you much I know. :sweat_smile:

Thanks a lot for your help!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.