Elasticsearch terms, group by and sum aggregation


(Aniket) #1

I have documents in elasticsearch that contains attributes like

{
 [
    {
        "DTCREATED": "2016-09-19T18:30:00.000Z",
        "amount ": 12
        "SUBMERCHANTTRANSACTION" : -1 
    },
    {
        "DTCREATED": "2016-09-20T18:30:00.000Z",
        "amount ": 30
        "SUBMERCHANTTRANSACTION" : 0 
    },
    {
        "DTCREATED": "2016-10-25T18:30:00.000Z",
        "amount ": 35
        "SUBMERCHANTTRANSACTION" : -1 
    } 
]

}

('-1' for yes '0' for no)

I want to get sum of all amount monthwise where SUBMERCHANTTRANSACTION is '-1' (Yes). I have tried below query

requestQuery =
     Requests.searchRequest(ConstantsValue.indexName)
    .types(ConstantsValue._Type)
    .source("{size:999999,"
    + "\"_source\" : "
    + "[\"DTCREATED\", \"amount\"]"
    + ",\"filter\": "
    + "{\"terms\": {\"SUBMERCHANTTRANSACTION\": -1},"
    + "\"aggs\" : "
    + "{\"group_by_DATE\" : {\"date_histogram\" : "
    + "{\"field\" : \"DTCREATED\", \"interval\" : \"1M\","
    + "\"format\" : \"yyyy-MM\" },"
    + "\"aggs\" : "
    + "{\"intraday_return\" : { \"sum\" : { \"field\" : \"amount\" } }}}}}}}");

But it does not working. any idea what I am missing ?


(Yannick Welsch) #2

which ES version is this?

Is the filter supposed to be a filter aggregation? If so, it must be properly wrapped in an aggs block.

In older ES versions the top-level filter is a deprecated name for the post_filter, which filters the documents returned in the hits part of the response without affecting the documents used in the aggregations.


(Aniket) #3

elasticsearch-1.5.2

I have edited above query but it still showing sum of all amount month wise but not considering match condition.
Here is the edited query

 .source("{size:999999,"
   + "\"_source\" : "
   + "[\"DTCREATED\", \"amount\"]"
   + " ,\"query\": {  \"filtered\": {\"query\": { \"match\": { \"SUBMERCHANTTRANSACTION\": \"-1\"}}}}"
   + ",\"aggs\" : "
   + "{\"group_by_DATE\" : {\"date_histogram\" : "
   + "{\"field\" : \"DTCREATED\", \"interval\" : \"1M\","
   + "\"format\" : \"yyyy-MM-dd\" },"
   + "\"aggs\" : "
    + "{\"sum_bar\" : { \"sum\" : { \"field\" : \"amount\" } }}}}}}}");

(Yannick Welsch) #4
  • why not use a term filter wrapped in a constant_score query?
  • can you show your mapping (curl -XGET 'http://localhost:9200/_mapping') ? This might help understand why it's not working.

(Aniket) #5

I have shared mapping. I want term filter on SUBMERCHANTTRANSACTION aggregation for date and sum of amount

    {
      "transaction" : {
        "mappings" : {
          "doc" : {
            "properties" : {
              "amount" : {
                "type" : "long"
              },
              "DTCREATED" : {
                "type" : "date",
                "format" : "dateOptionalTime"
              },
              "SUBMERCHANTTRANSACTION" : {
                "type" : "string"
              },
            }
         }
       }
    }
}

(Yannick Welsch) #6

SUBMERCHANTTRANSACTION is an analyzed field in your mapping, which is probably not what you want. If it contains only numeric values, it should be a "long", otherwhise a non-analyzed string.

The following query works for me:

curl -XPOST 'http://localhost:9200/transaction'

curl -XPUT 'http://localhost:9200/transaction/_mapping/doc' -d '
{
    "doc" : {
        "properties" : {
          "amount" : {
            "type" : "long"
          },
          "DTCREATED" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "SUBMERCHANTTRANSACTION" : {
            "type" : "string",
            "index" : "not_analyzed"
          }
        }
     }
}
'
curl -XPOST 'http://localhost:9200/transaction/doc/1' -d '{"DTCREATED": "2016-09-19T18:30:00.000Z", "amount": 12, "SUBMERCHANTTRANSACTION": -1}'

curl -XPOST 'http://localhost:9200/transaction/doc/2' -d '{"DTCREATED": "2016-09-20T18:30:00.000Z", "amount": 30, "SUBMERCHANTTRANSACTION": 0}'

curl -XPOST 'http://localhost:9200/transaction/doc/3' -d '{"DTCREATED": "2016-10-25T18:30:00.000Z", "amount": 35, "SUBMERCHANTTRANSACTION": -1}'

curl -XPOST 'http://localhost:9200/transaction/_refresh'


curl -XPOST 'http://localhost:9200/transaction/_search?pretty' -d '
{
    "_source" : ["DTCREATED", "amount"],
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : { "SUBMERCHANTTRANSACTION" : -1 }
            }
        }
    },

    "aggs" : {
        "group_by_DATE" : {
            "date_histogram" : {
              "field" : "DTCREATED",
              "interval" : "1M",
              "format" : "yyyy-MM-dd"
            },
            "aggs" : {
              "sum_bar" : {
                "sum" : {
                  "field" : "amount"
                }
              }
            }
        }
    }
}'

which yields

  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "transaction",
      "_type" : "doc",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{"DTCREATED":"2016-09-19T18:30:00.000Z","amount":12}
    }, {
      "_index" : "transaction",
      "_type" : "doc",
      "_id" : "3",
      "_score" : 1.0,
      "_source":{"DTCREATED":"2016-10-25T18:30:00.000Z","amount":35}
    } ]
  },
  "aggregations" : {
    "group_by_DATE" : {
      "buckets" : [ {
        "key_as_string" : "2016-09-01",
        "key" : 1472688000000,
        "doc_count" : 1,
        "sum_bar" : {
          "value" : 12.0
        }
      }, {
        "key_as_string" : "2016-10-01",
        "key" : 1475280000000,
        "doc_count" : 1,
        "sum_bar" : {
          "value" : 35.0
        }
      } ]
    }
  }
}

(Aniket) #7

Yes.. this is what I want. Thank you


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.