Sum nested values while applying match conditions

Hi,

I have a document which looks like:

{
  "comments_by_day" : [
  {
    "dateofcomment" : "2017-09-20",
    "numberofcomments" : 10 
  },
  {
    "dateofcomment" : "2017-09-24",
    "numberofcomments" : 5
  },
  {
    "dateofcomment" : "2017-09-29",
    "numberofcomments" : 9 
  },
  {
    "dateofcomment" : "2017-10-03",
    "numberofcomments" : 25 
  },
  {
    "dateofcomment" : "2017-10-07",
    "numberofcomments" : 10
  }
]
}

I want to write a query which will apply a date filter while summing the value in the numberofcomments field. So find users who have commented more than 10 times between 2017-09-23 and 2017-10-01.

Is it possible to do this using ElasticSearch?

Any help is highly appreciated.

Thanks.

you should take a look at nested documents

see this example

DELETE test

PUT test
{
  "mappings": {
    "doc" : {
      "properties": {
        "comments_by_day" : {
          "type": "nested"
        }
      }
    }
  }
}

PUT test/doc/1
{
  "comments_by_day": [
    {
      "dateofcomment": "2017-09-20",
      "numberofcomments": 10
    },
    {
      "dateofcomment": "2017-09-24",
      "numberofcomments": 5
    },
    {
      "dateofcomment": "2017-09-29",
      "numberofcomments": 9
    },
    {
      "dateofcomment": "2017-10-03",
      "numberofcomments": 25
    },
    {
      "dateofcomment": "2017-10-07",
      "numberofcomments": 10
    }
  ]
}

GET test/_search
{
  "size": 0,
  "aggs": {
    "foo": {
      "nested": {
        "path": "comments_by_day"
      },
      "aggs": {
        "my_filter": {
          "filter": {
            "bool": {
              "filter": {
                "range": {
                  "comments_by_day.dateofcomment": {
                    "gte": "2017-10-01"
                  }
                }
              }
            }
          },
          "aggs": {
            "sum": {
              "sum": {
                "field": "comments_by_day.numberofcomments"
              }
            }
          }
        }
      }
    }
  }
}

Thanks for the response.

I have already indexed the comments_by_day field as a nested datatype. The problem I have is that there are other fields in the document as well like:

{
  "city" : "LONDON",
  "age" : 29,
  "comments_by_day" : [
  {
    "dateofcomment" : "2017-09-20",
    "numberofcomments" : 10 
  },
  {
    "dateofcomment" : "2017-09-24",
    "numberofcomments" : 5
  },
  {
    "dateofcomment" : "2017-09-29",
    "numberofcomments" : 9 
  },
  {
    "dateofcomment" : "2017-10-03",
    "numberofcomments" : 25 
  },
  {
    "dateofcomment" : "2017-10-07",
    "numberofcomments" : 10
  }
]
}

How should I modify the query that you provided, to return something like all users who live in "LONDON" and are between 25 to 35 years old and have commented more than 10 times in total between 2017-09-23 and 2017-10-01?

In the comments_by_day field there are 2 nested documents which are between 2017-09-23 and 2017-10-01 and the sum of the numberofcomments field in those 2 nested documents is 5 + 9 = 14, so this document should be returned by the query.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.