Elasticsearch-sql can't query unindexed field

As we know , unindexed field will store in _source field,so we can get those unindexed fields in _source.But Elasticsearch-sql can't get unindexed field.

GET /_sql/translate
{
"query":"""
select * from "view-laohu-sdk" limit 1
"""
}
look the picture blow,“_source”:false lead to we can't get unindexed field
image

That's probably true. If you don't store the content, the content is searchable but you can't retrieve it.

Any reason you are doing this?

Please look at the problem carefully,I unindexed these fields and these fields stored in _source
default. But I can't retrieve these fields when I use elasticsearch-sql.
image

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Reading unindexed fields has a significant performance impact and can be very slow and scale badly. I wonder if it is therefore disabled within the SQL interface.

Maybe,but this situation is very widely used .If I use SQL ,I must index all fields although there are some fields That I don't want to do search. I think index all fields has a more significant performance impact.

Elasticsearch is not a relational database and does not store non-indexed data in a columnar format. In order to access data that is not indexed it needs to retrieve and parse the full source document, which is a lot slower than access indexed fields. This is why I suspect it might have been disabled on purpose, but someone from Elastic can probably shed some light on this.

I don't think so .If you retrieve many fields, from “_source” is faster than from docvalue.This is why Elasticsearch stored field in “_source” default.

That does depend on the number of fields returned and the size of the documents.

However, SQL Language can control what fields are returned ,such as “select field1” or “select *”,I think SQL interface should be flexible, not banned altogether

Hello @ITzhangqiang,

As detailed in SQL Limitations, most columns will now always be retrieved from _source, with some exceptions (such as the @timestamp field of type date, in your example) .

And in accordance to that, as an example: if you run something like SELECT "@timestamp", bytes FROM kibana_sample_data_logs, you will get a translation like:

 "_source" : {
    "includes" : [
      "bytes"
    ],
    "excludes" : [ ]
 },
 "docvalue_fields" : [
    {
      "field" : "@timestamp",
      "format" : "epoch_millis"
    }
 ],

since bytes is a long field, and a corresponding answer:

        "_source" : {
          "bytes" : 3298
        },
        "fields" : {
          "@timestamp" : [
            "1533540155581"
          ]
        },
2 Likes

You may not understand my question,you can reproduce the problem by following the steps below :

> ①PUT test
②PUT test/_mapping
{
  "dynamic": "false",
  "properties": {
    "field1": {
      "type": "keyword"
    }
  }
}

③PUT test/_doc/1
{
  "field1":"111",
  "field2":"222"
}
④GET test/_search

⑤GET /_sql?format=json
{
  "query":"""
    select * from "test"
    """
}

I see.

As you can read in Dynamic field mapping or the dynamic mapping parameters, the "new" dynamic2 field will not be indexed so will not be searchable.

So, returning the entire index as in your example will dump the _source, but:

POST test/_search
{
  "query": {
    "match": {
      "field2": "222"
    }
  }
}

will return nothing.

And to show where's the problem: with your example index, GET test/_field_caps?fields=field* will only return:

{
  "indices" : [
    "test"
  ],
  "fields" : {
    "field1" : {
      "keyword" : {
        "type" : "keyword",
        "searchable" : true,
        "aggregatable" : true
      }
    }
  }
}

ES/SQL queries will also not run through all stored docs of an index to match what non searchable but _source stored fields might be relevant. Besides performance, since this field is not indexed, how could SQL know the type of this field and how to handle it; ex.: should field2+1=223 be allowed, or rather CHAR_LENGTH(field2)=3?

1 Like

I see.
But if a index has 50 fields(only 10 fields need search,so we just index these 10 fields).
When I use SQL, I can't get the other 30 unindexed fields back,I think this is unreasonable.
I must index all fields,this is a waste of storage.

"how could SQL know the type of this field and how to handle it; ex.: should field2+1=223 be allowed, or rather CHAR_LENGTH(field2)=3 ?"

First,if a field is unindexed,user will not do that like field2+1=223. If they do that ,SQL can throw Exception to prompt them the wrong usage.
In addition, fetching unindexed fields is a very common use case,if SQL don't support this,That would be a great pity.

If you know in advance your index fields, you can add these to the mapping as stored, but not indexed:

PUT test/_mapping
{
  "dynamic": "false",
  "properties": {
    "field1": {
      "type": "keyword"
    },
    "field2": {
      "type":"keyword",
      "index": false,
      "store": true
    }
  }
}

That will return your field2 in SELECT * FROM cust (but you of course won't be able to search by it).

1 Like

How much?
Is that a real problem? I mean that in some corner use cases that could be a problem but I have often seen engineers (like myself) trying to solve problems they don't have actually.

In my usage scenario,a index have many fields that don't need to search.If these fields set "index": false,"store": true ,when I use "select * from xx",many fields retrieve from 'doc_value' not '_source', this should have a significant impact on performance.

Just to be sure. What is the impact? Did you measure it?

1 Like

????

If an index has thousands of fields, do you think it is faster to fetch all the fields from _source or from DocValue.
Obviously, from _source is faster :rofl: