Sum all first documents

My documents are like this:

{
    "InvoiceNumber" : "12545587",
    "ArticleNumber" : "45521212",
    "LineTotal" :  145.00,
    "InvoiceTotal" : 3200.00
}

Each document used to be an invoice line (article is unique per document) on an invoice.
One of the things I would like to do is this

Calculate the total value of all invoices in the system.

Note that you don't just sum InvoiceTotal, since that field is superfluously added to each article line.

I need to to sum all first (or some other way to select one document per bucket) documents from buckets created by InvoiceNumber.

Things I tried

  • I tried top_hits aggregation, but you can't sum on that
  • I tried max, but you can't sum on that
  • I tried looking at pipelines but I have no idea how that works

Hello Johan,

You could try scripted metric aggregations:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html

You should be able to accomplish this on your index using something similar to the following:

POST <your_index_name>/_search?size=0
{
    "query" : {
        "match_all" : {}
    },
    "aggs": {
        "invoice_sum": {
            "scripted_metric": {
                "init_script" : "state.invoiceTotals = 0",
                "map_script" : "state.invoiceTotals += doc.InvoiceTotal.value"
            }
        }
    }
}

Regards,
Aaron

So I took it here

POST aaa_invoices/_search
{
  "aggs": {
    "invoice_sum": {
      "scripted_metric": {
        "init_script" : "state.invoiceTotals = new ArrayList(); state.MessageNumbers = new ArrayList();",
        "map_script" : "if (state.MessageNumbers.indexOf(doc['enveloppe.Berichtnummer.keyword']) == -1) { state.invoiceTotals.add(doc['totFactBedrag'].get(0)); state.MessageNumbers.add(doc['enveloppe.Berichtnummer.keyword']); }",
        "combine_script": "double invoiceTotal = 0.0; for (i in state.invoiceTotals) { invoiceTotal += i } return [ invoiceTotal, state.invoiceTotals, state.MessageNumbers ];"
      }
    }
  }
}

But this exceeds the max_result_window and only gives me the total for 10 invoices, instead of the total of all 40k invoices.

I actually know how I can fix it - just create two indexes, one with invoices and one with invoicelines. But that beats the purpose doesn't it?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.