Aggregating the specific nested documents only


(Rakesh A) #1
Hi,

I just want to aggregate the specific nested documents which satisfies the given query.

Let me explain it through an example,

i have two records in my index
first one ,
{
  "project": [
    {
      "subject": "maths",
      "marks": 47
    },
    {
      "subject": "computers",
      "marks": 22
    }
  ]
}

second one,
{
  "project": [
    {
      "subject": "maths",
      "marks": 65
    },
    {
      "subject": "networks",
      "marks": 72
    }
  ]
}

i just want to have an average of maths subject alone from the given documents.

so the query i tried is
{
  "size": 0,
  "aggs": {
    "avg_marks": {
      "avg": {
        "field": "project.marks"
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "project.subject:maths",
            "analyze_wildcard": true,
            "default_field": "*"
          }
        }
      ]
    }
  }
}
which was returning the result of aggreagating all the marks average which is not requireb by me,

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_marks": {
      "value": 51.5
    }
  }
}

I just need an average of maths subject from the given documents, i expected result like below,

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_marks": {
      "value": 56
    }
  }
}

Is there any possibility or any modification in my query would be helpful.

Thanks in Advance

(Abdon Pijpelink) #2

There are several ways of approaching this. Generally speaking, the most performant way would be to create a single document per project, rather than documents with multiple nested objects. Querying these flat "document per project" documents is going to be very fast.

However, if you need to use nested objects, make sure that you actually map those objects with the nested datatype. Otherwise you will not be able to query and aggregate those nested objects independently.

In your case, you would create your index with a mapping like this (note the project field is mapped as type nested):

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "project": {
          "type": "nested",
          "properties": {
            "subject": {
              "type": "text"
            },
            "marks": {
              "type": "integer"
            }
          }
        }
      }
    }
  }
}

Now, you can index the documents like you have been doing:

PUT my_index/_doc/1
{
  "project": [
    {
      "subject": "maths",
      "marks": 47
    },
    {
      "subject": "computers",
      "marks": 22
    }
  ]
}

PUT my_index/_doc/2
{
  "project": [
    {
      "subject": "maths",
      "marks": 65
    },
    {
      "subject": "networks",
      "marks": 72
    }
  ]
}

Then, to get the aggregation results you want, you will need to use a nested aggregation. You could move your query inside of this nested aggregation using a filter aggregation, to limit the aggregation scope to just those nested objects that match your query.

Your request would end up looking like this:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "nested_projects": {
      "nested": {
        "path": "project"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                {
                  "query_string": {
                    "query": "project.subject:maths",
                    "analyze_wildcard": true,
                    "default_field": "*"
                  }
                }
              ]
            }
          },
          "aggs": {
            "avg_marks": {
              "avg": {
                "field": "project.marks"
              }
            }
          }
        }
      }
    }
  }
}

By the way, the bool query with just a single must clause does not really add anything. You could simplify the request to this and get the same results:

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "nested_projects": {
      "nested": {
        "path": "project"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "query_string": {
              "query": "project.subject:maths",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          },
          "aggs": {
            "avg_marks": {
              "avg": {
                "field": "project.marks"
              }
            }
          }
        }
      }
    }
  }
}

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.