Elastic search query for composite keys like in SQL

I have below scenario: Suppose I have a relational table :

Key1   Key2  Key3   Value

A       x     v1      0
A       x     v1      10
B       x     v1      5
A       y     v2      7
A       y     v2      2

Here I have logical key composite of (Key1, Key2, Key3). Now I need records with the maximum value. So I expect below result corresponding to different logical keys:

Key1   Key2  Key3   Value

A       x     v1      10
B       x     v1      5
A       y     v2      7

Now I want to write an Elastic Search query for it. Can somebody give me some idea about it ?

Hi,

if you only need to query one row (key1, key2, key3) at a time, this is possible by simply filtering on the three key fields, descending sort on value and using only the topmost result ("size" : 1). Assuming your keys are String values and you store the non-analyzed in a subfield (here called keyword) a query might look like this.

GET /test/type/_search
{
  "size": 1, 
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "key1.keyword": {
              "value": "A"
            }
          }
        },
        {
          "term": {
            "key2.keyword": {
              "value": "y"
            }
          }
        },
        {
          "term": {
            "key3.keyword": {
              "value": "v2"
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "value": {
        "order": "desc"
      }
    }
  ]
}

Or you want the terms aggregation.

Thanks for your reply, here I want all the records as I explained that I like the SQL like query where I can get different records with unique logical key.

Hi,

if you want more than a single row, you can use the terms aggregation like @nik9000 mentioned. A solution might might look something like this in your case:

GET /test/type/_search
{
  "size": 0,
  "aggs": {
    "by_key1": {
      "terms": {
        "field": "key1.keyword"
      },
      "aggs": {
        "by_key2": {
          "terms": {
            "field": "key2.keyword"
          },
          "aggs": {
            "by_key3": {
              "terms": {
                "field": "key3.keyword"
              },
              "aggs": {
                "max_value": {
                  "max": {
                    "field": "value"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

There are a few drawbacks to note though:

  • the response will have a few levels of nested buckets that you need to navigate in order to get a nicer representation like the table you get in SQL. This could be solved by introducing an extra field with the combined key (e.g. concatenated: key1_key2_key3) and then use this in a single terms aggregation
  • terms aggregations are designed to efficiently return the Top N terms per field, usually ordered by document count. Requesting all possible combinations of keys in your case might be a bad idea in terms of memory consumtion and performance (see https://github.com/elastic/elasticsearch/issues/18838 for some issues with requesting all buckets). You will really need to test this with the amount of data that you are expecting.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.