[SQL] Return type constraints are overly restrictive

Hi

Executing the following

DELETE nested_numbers
POST /nested_numbers/_doc
{"numbers": [{"n": 1}, {"n": 2}]}
POST /_xpack/sql?format=json
{
  "query": "select * from nested_numbers"
}

will fail with an sql_illegal_argument_exception stating "reason": "Arrays (returned by [numbers.n]) are not supported". But I'm not sure there's a good reason for this restriction (note that the format above is json, not txt).

If I take the result of the sql request, pass it to /translate and then _search I get the expected results.

GET nested_numbers/_search
# The GET body below is the response from
#
#    POST /_xpack/sql/translate
#    {
#    "query": "select * from nested_numbers"
#    }
{
  "size": 1000,
  "_source": false,
  "stored_fields": "_none_",
  "docvalue_fields": [
    "numbers.n"
  ],
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ]
}

The only good reason I can think of for this restriction is that it makes JDBC connectivity awkward, but the JDBC driver for PostgreSQL works ok with JSON. Would it be possible to allow SQL queries to return arrays and nested types?

Finally, the error message associated with this condition is type-dependent. The following errors with # "reason": "Cannot extract value [numbers.n] from source".

DELETE nested_strings
POST /nested_strings/_doc
{"numbers": [{"n": "one"}, {"n": "two"}]}
POST /_xpack/sql?format=json
{
  "query": "select * from nested_strings"
}

Paul

I've realised that as this restriction doesn't apply when using nested docs, this isn't actually an issue for me. However, some docs describing these error messages would be very helpful.

Hi @paulcarey,
Glad to see you took ES-SQL for a spin.

The reason for not supporting arrays is, in principle, related to SQL way of dealing with values: rows and columns where each element in the matrix is a single value. When you have an array of values for a certain row and column, which one do you want to return? The first, third, fifth etc? Also, this being JSON (as the native format in which _source is stored in Elasticsearch - and JSON is by definition unordered) how do you define "first", "third" etc.

Also, SQL doesn't have the notion of arrays of values.

Thanks for the response, but stating that SQL doesn't have the notion of arrays of values isn't really correct.

Obviously 'SQL' can be a bit ambiguous, depending on versions of specs etc., but here's an example of using Arrays of Structs with JDBC and PostgreSQL.

https://www.enterprisedb.com/blog/using-java-manipulate-sql-structures-and-arrays

The Java tutorial has an example showing JDBC usage of Array of String

https://docs.oracle.com/javase/tutorial/jdbc/basics/array.html

SQL 2016 formally defines JSON support and these Microsoft docs describe usage in SQL Server.

I just wanted to point out that wrapping primitive values in objects can serve as a workaround.

DELETE array_test_1

PUT array_test_1
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        },
        "values": {
          "type": "nested",
          "properties": {
            "v": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

POST array_test_1/_doc
{
  "name": "foo",
  "values": [{"v": 1}, {"v": 2}, {"v": 3}]
}

POST _xpack/sql?format=txt
{
  "query": "select name, values.v as v from array_test_1"
}


     name      |       v       
---------------+---------------
foo            |3              
foo            |2              
foo            |1    

Would you, please, create an issue for this, with this suggestion? Thanks.

Sure, I've created https://github.com/elastic/elasticsearch/issues/33204

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.