Sum nested values and filter / sort by them

Hi,

as in this thread, my problem is pretty similar:

I have a product index with a nested sales_per_day array. It looks something like this:

PUT /products
{
  "mappings": {
      "properties": {
        "sales_per_day" : {
          "type": "nested"
        }
      }
  }
}

This is an example doc:

GET /products/_search
{
    "took": 838,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "products",
                "_type": "_doc",
                "_id": "rXFCUnMBxSMm9fNU3xFV",
                "_score": 1.0,
                "_source": {
                    "revenue_per_day": [],
                    "id": 123456789,
                    "customer_id": 123456,
                    "sku": "some sku",
                    "title": "some product title",
                    "sales_per_day": [
                        {
                            "date": "2020-05-10",
                            "sales": 3
                        },
                        {
                            "date": "2020-05-11",
                            "sales": 1
                        }
                    ],
                    "column100": "foobar"
                }
            }
        ]
    }
}

What i want to display to the user is:

  • the summed sales in a query-defined time-period
  • filter by those sales (sales > 5, sales = 0, sales = 10, etc.)
  • sort by those sales
  • filter other columns at the same time
  • retrieve the whole matched document

Is this possible & if so, how?

Unfortunately I was not able to solve this. I thought about the problem a lot and realized the query I was trying to write was equivalent to the HAVING clause in a sql query. I don't believe ES supports HAVING queries out of the box. I do believe that the best way to solve this problem is by using a script query of some kind. But I am not good at ES Script Queries so I gave up. Here is the documentation for it: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-script-query.html

This feature request for the product I was working on was rescinded so I didn't give it much thought after I posted the original question. But there is indeed a very good mathematical solution, if you are not interested in using script queries. It is to apply a mathematical transformation to data during ingestion. Here is what I mean:

For example, let's say this is what the original document looks like :

{
  "comments_by_day" : [
  #entry 1
  {
    "dateofcomment" : "2017-09-20",
    "numberofcomments" : 10 
  },
  #entry 2
  {
    "dateofcomment" : "2017-09-24",
    "numberofcomments" : 5
  },
  #entry 3
  {
    "dateofcomment" : "2017-09-29",
    "numberofcomments" : 9 
  }
]
}

Convert the document to look like this :

{
	"aggregate_comments_by_day" : [
	#summing entry 1
	{
		"startdateofcomments" : "2017-09-20",
		"enddateofcomments" : "2017-09-20",
		"sumofcommentsbetweendaterange": 10
	},
	#summing entry 2
	{
		"startdateofcomments" : "2017-09-24",
		"enddateofcomments" : "2017-09-24",
		"sumofcommentsbetweendaterange": 5
	},
	#summing entry 3
	{
		"startdateofcomments" : "2017-09-29",
		"enddateofcomments" : "2017-09-29",
		"sumofcommentsbetweendaterange": 9
	},
	#summing entry 1 & 2
	{
		"startdateofcomments" : "2017-09-20",
		"enddateofcomments" : "2017-09-24",
		"sumofcommentsbetweendaterange": 15
	},
	#summing entry 1, 2 & 3
	{
		"startdateofcomments" : "2017-09-20",
		"enddateofcomments" : "2017-09-29",
		"sumofcommentsbetweendaterange": 24
	},
	#summing entry 2 & 3
	{
		"startdateofcomments" : "2017-09-24",
		"enddateofcomments" : "2017-09-29"
		"sumofcommentsbetweendaterange": 14
	},
	]
}

Hope this helps. Let me know if you have any questions!

Just for infos, there is an aggs that is simular to Group by HAVING is SQL
Check this one here

Thanks @Rory and @ylasri

I'm trying around with the script filtering like this:

POST /products/_search
{
    "query": {
        "bool" : {
            "filter" : {
                "script" : {
                    "script" : {
                        "source" : "int total = 0; for (int i = 0; i < doc['sales_per_day'].length; ++i) {total += doc['sales_per_day'][i];} return total > 0;"
                    }
                }
            }
        }
    }
}

but it throws this error:

{
    "error": {
        "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:90)",
                    "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
                    "i = 0; i < doc['sales_per_day'].length; ++i) {",
                    "               ^---- HERE"
                ],
                "script": "int total = 0; for (int i = 0; i < doc['sales_per_day'].length; ++i) {total += doc['sales_per_day'][i];} return total > 0;",
                "lang": "painless",
                "position": {
                    "offset": 39,
                    "start": 24,
                    "end": 62
                }
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "products",
                "node": "C2CJD2D1Q1-wNIlbTubV4A",
                "reason": {
                    "type": "script_exception",
                    "reason": "runtime error",
                    "script_stack": [
                        "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:90)",
                        "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
                        "i = 0; i < doc['sales_per_day'].length; ++i) {",
                        "               ^---- HERE"
                    ],
                    "script": "int total = 0; for (int i = 0; i < doc['sales_per_day'].length; ++i) {total += doc['sales_per_day'][i];} return total > 0;",
                    "lang": "painless",
                    "position": {
                        "offset": 39,
                        "start": 24,
                        "end": 62
                    },
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "No field found for [sales_per_day] in mapping with types []"
                    }
                }
            }
        ]
    },
    "status": 400
}

Isn't the name of your field sales_per_day?

Yes, sorry i played a bit with the field name. This error happens when i use the right field name.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.