How to filter nested objects in nested aggregation

Hello,

I am having a minor issue with an "aggregation" query I have been using for a couple years.

My scenario:
I have an index of documents that have a "searchByName" nested property. The "searchByName" property is a list of names and I am querying the index for any documents where at least one of the names in the property list matches my search criteria. I am then performing an "aggregation" on the results of the query to group the company names into buckets and see how many times the name is present in all documents.

So for example, if I have 2 documents, one containing "Apples Company" and "Bananas Company" and a second document containing "Bananas Company" and "Oranges Company", if I search for "Bananas" with a prefix query, I am hoping to get a result something like this:

 "buckets": [
      {
          "key": "Bananas Company",
          "doc_count": 2
      }
  ]

What I am actually receiving is this:

 "buckets": [
    {
        "key": "Bananas Company",
        "doc_count": 2
    },
    {
        "key": "Apples Company",
        "doc_count": 1
    },
    {
        "key": "Oranges Company",
        "doc_count": 1
    }
]

This is because the documents contain multiple company names and it is aggregating all the company names, not just the ones that match my query statement, which makes sense.

So I am curious if there is another way for me to navigate the query down to the individual field value level, then aggregate from there instead of querying the documents, then aggregating all of the available field values.

See below for my examples. Any help would be much appreciated. Elastic has covered every other situation I have needed very well and I assume this is a scenario of lack of knowledge on my part, not lack of functionality in Elasticsearch.

Index Setup

Index Setup:
{
   "company_records":{
      "mappings":{
         "properties":{
            "datasetItem":{
               "type":"object",
               "enabled":false
            },
            "searchByName":{
               "type":"nested",
               "properties":{
                  "name":{
                     "type":"text",
                     "fields":{
                        "keyword":{
                           "type":"keyword"
                        }
                     },
                     "analyzer":"ws_search_analyzer"
                  }
               }
            }
         }
      },
      "settings":{
         "index":{
            "codec":"best_compression",
            "refresh_interval":"-1",
            "number_of_shards":"10",
            "analysis":{
               "analyzer":{
                  "ws_search_analyzer":{
                     "filter":[
                        "lowercase",
                        "asciifolding"
                     ],
                     "type":"custom",
                     "tokenizer":"whitespace"
                  }
               }
            },
            "number_of_replicas":"0"
         }
      }
   }
}

Document 1 Example:

{
   "_index":"company_records",
   "_id":"1",
   "_source":{
      "searchByName":[
         {
            "name":"Apples Company"
         },
		 {
            "name":"Bananas Company"
         }
      ]
   }
}

Document 2:

{
   "_index":"company_records",
   "_id":"2",
   "_source":{
      "searchByName":[
         {
            "name":"Bananas Company"
         },
		 {
            "name":"Oranges Company"
         }
      ]
   }
}

Query:


Query:
POST index_name/_search
{
  "_source": false,
  "query": {
    "bool": {
      "must": {
        "nested": {
          "path": "searchByName",
          "query": {
            "prefix": {
              "searchByName.name": "Bananas"
            }
          }
        }
      }
    }
  },
  "aggs": {
    "nameAggregations": {
      "nested": {
        "path": "searchByName"
      },
      "aggs": {
        "names": {
          "terms": {
            "field": "searchByName.name.keyword",
            "size": 100
          }
        }
      }
    }
  }
}

Aggregation Reponse (I stripped out the stats and hits area):

{
    "aggregations": {
        "nameAggregations": {
            "doc_count": 2,
            "names": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "Bananas Company",
                        "doc_count": 2
                    },
                    {
                        "key": "Apples Company",
                        "doc_count": 1
                    },
                    {
                        "key": "Oranges Company",
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}

Hi!
Thank you for detailed post. It is easy to understand what you did and what you want to do.

If it is a nested field, you can filter searchByName.name not within the "query" clause, but using filter aggregation under the nested aggregation. Then you can filter the nested objects.

Perfect! That looks like exactly what I need. I will give this a try and see how it works out.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.