Can elasticsearch do GROUP BY multi fields and ORDER BY count?

In SQL I would do it possibly like this:

SELECT field1,field2, count(*) as cnt FROM table GROUP BY field1,field2 ORDER BY cnt DESC;

Is aggregate query like that possible with ES?

Hi rookie1.
Yes, you can group data by multiple fields. One difference from SQL is that that results can be a tree structure with hierarchy rather than thinking of them like a flattened table of results.
You would use the terms aggregation to group information. For example, given an index of investment data field1 might be investor and field 2 might be the company invested in:

GET /crunchbase/_search
{
  "size":0,
  "aggregations": {
	"first_by_investor": {
	  "terms": {
		"field": "investor"
	  },
	  "aggregations": {
		"then_by_company": {
		  "terms": {
			"field": "company"
		  }
		}
	  }
	}
  }
}

The results are a hierarchy like this (default sort size is by number of docs):

...	
  "aggregations" : {
	"first_by_investor" : {
	  "buckets" : [
		{
		  "key" : "New Enterprise Associates",
		  "doc_count" : 445,
		  "then_by_company" : {
			"buckets" : [
			  {
				"key" : "PatientKeeper",
				"doc_count" : 5
			  },
			  {
				"key" : "SolFocus",
				"doc_count" : 5
			  }
              ...
			},
			{
			  "key" : "SV Angel",
			  "doc_count" : 436,				  
			   ...
			}
    ]

@rookie1 or you can try exactly the same query you have there in Elasticsearch SQL and the results will be displayed just like it would when using a relational database.

Or you can use the ES SQL translate API to see what kind of Elastisearch DSL query we create from the SQL query provided. Please, note that the query will be slightly different from the one @Mark_Harwood provided, because ES SQL will use a composite aggregation on top to allow users to paginate through the results (a common requirement in SQL world using cursors).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.