SQL like GROUP BY AND HAVING

Hi,

I want to achieve some functionality which is available in SQL data stores. I have tried a lot but having a hard time achieving that functionality with elasticsearch. I am successful in creating the queries but the query does not seem to work correctly.

I want to do:

  • Group by based on some id
  • Filter out groups with some condition
  • Count the filtered results

Tried on Elasticsearch 5.2.2 and 5.6.3
Basically I want to perform query similar to the following SQL:

SELECT COUNT(*) FROM
(
   SELECT
	senderResellerId,
	SUM(requestAmountValue) AS t_amount
   FROM
	transactions
   GROUP BY
	senderResellerId
   HAVING
	t_amount > 10000 ) AS dum;

Working Correctly

I have achieved functionality equivalent to grouping by creating the buckets as follows:

GROUP BY SQL

SELECT
	senderResellerId,
	SUM(requestAmountValue) AS t_amount
FROM
	transactions
GROUP BY
	senderResellerId

GROUP BY DSL

{
  "aggregations": {
	"reseller_sale_sum": {
	  "aggregations": {
		"sales": {
		  "sum": {
			"field": "requestAmountValue"
		  }
		}
	  }, 
	  "terms": {
		"field": "senderResellerId", 
		"order": {
		  "sales": "desc"
		}, 
		"size": 5
	  }
	}
  }, 
  "ext": {}, 
  "query": {  "match_all": {} }, 
  "size": 0
}

Not working as expected

I have also made a query which applies filters on the buckets but incorrect results are shown.
SQL Equivalent

SELECT
	senderResellerId,
	SUM(requestAmountValue) AS t_amount
FROM
	transactions
GROUP BY
	senderResellerId
HAVING
	t_amount > 10000

DSL

{
  "aggregations": {
	"reseller_sale_sum": {
	  "aggs": {
		"sales": {
		  "aggregations": {
			"reseller_sale": {
			  "sum": {
				"field": "requestAmountValue"
			  }
			}
		  }, 
		  "filter": {
			"range": {
			  "reseller_sale": { 
				"gte": 10000
			  }
			}
		  }
		}
	  }, 
	  "terms": {
		"field": "senderResellerId", 
		"order": {
		  "sales>reseller_sale": "desc"
		}, 
		"size": 5
	  }
	}
  }, 
  "ext": {}, 
  "query": {  "match_all": {} }, 
  "size": 0
}

Actual Results

{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 150824,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "reseller_sale_sum" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 149609,
      "buckets" : [
        {
          "key" : "RES0000000004",
          "doc_count" : 8,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000005",
          "doc_count" : 39,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000006",
          "doc_count" : 57,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        },
        {
          "key" : "RES0000000007",
          "doc_count" : 134,
          "sales" : {
            "doc_count" : 0,
            "reseller_sale" : {
              "value" : 0.0
            }
          }
        }
          }
        }
      ]
    }
  }
}

My Mapping

{
  "mappings": {
	"tdrs": {

	  "properties": {
		"@timestamp": {
		  "type": "date"
		},
		"@version": {
		  "type": "keyword"
		},
		"endTime": {
		  "type": "date"
		},
		"requestAmountValue": {
		  "type": "float"
		},
		"senderResellerId": {
		  "type": "keyword"
		},
	  }
	}
  },
  "settings": {
	"index": {
	  "number_of_replicas": "0",
	  "number_of_shards": "1"
	}
  }
}
2 Likes

Here is a very good answer by Nikolay Vasiliev,

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.