Elastic search query for composite keys like in SQL

I have below scenario: Suppose I have a relational table :

Key1   Key2  Key3   Value

A       x     v1      0
A       x     v1      10
B       x     v1      5
A       y     v2      7
A       y     v2      2

Here I have logical key composite of (Key1, Key2, Key3). Now I need records with the maximum value. So I expect below result corresponding to different logical keys:

Key1   Key2  Key3   Value

A       x     v1      10
B       x     v1      5
A       y     v2      7

Now I want to write an Elastic Search query for it. Can somebody give me some idea about it ?

Hi,

if you only need to query one row (key1, key2, key3) at a time, this is possible by simply filtering on the three key fields, descending sort on value and using only the topmost result ("size" : 1). Assuming your keys are String values and you store the non-analyzed in a subfield (here called keyword) a query might look like this.

GET /test/type/_search
{
  "size": 1, 
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "key1.keyword": {
              "value": "A"
            }
          }
        },
        {
          "term": {
            "key2.keyword": {
              "value": "y"
            }
          }
        },
        {
          "term": {
            "key3.keyword": {
              "value": "v2"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "value": {
        "order": "desc"
      }
    }
  ]
}

Or you want the terms aggregation.

Thanks for your reply, here I want all the records as I explained that I like the SQL like query where I can get different records with unique logical key.

Hi,

if you want more than a single row, you can use the terms aggregation like @nik9000 mentioned. A solution might might look something like this in your case:

GET /test/type/_search
{
  "size": 0,
  "aggs": {
    "by_key1": {
      "terms": {
        "field": "key1.keyword"
      },
      "aggs": {
        "by_key2": {
          "terms": {
            "field": "key2.keyword"
          },
          "aggs": {
            "by_key3": {
              "terms": {
                "field": "key3.keyword"
              },
              "aggs": {
                "max_value": {
                  "max": {
                    "field": "value"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

There are a few drawbacks to note though:

  • the response will have a few levels of nested buckets that you need to navigate in order to get a nicer representation like the table you get in SQL. This could be solved by introducing an extra field with the combined key (e.g. concatenated: key1_key2_key3) and then use this in a single terms aggregation
  • terms aggregations are designed to efficiently return the Top N terms per field, usually ordered by document count. Requesting all possible combinations of keys in your case might be a bad idea in terms of memory consumtion and performance (see https://github.com/elastic/elasticsearch/issues/18838 for some issues with requesting all buckets). You will really need to test this with the amount of data that you are expecting.