[SQL] Expose _source in SQL queries

Hi

As far as I can see, it's not possible to select the _source field in ES SQL. Attempting to do so in 6.5.4 returns an Unknown column exception. This would be a useful feature - in my case I don't want to index the large source document, but users do sometimes want to view it.

Are there any plans to expose the _source field in SQL?

Many thanks

Paul

Hi @paulcarey.
Indeed, it's not possible to select the _source because that's a meta field and it's not actually coming from the user, meaning one searches/returns back what it's been inserted/indexed (more like "what you see is what you get" kind of approach).
There's always the alternative of indexing your own defined "source" field.

1 Like

Thanks, but why not expose a meta object on all tables that would contain _source, _id, etc?

The docs on source filtering state that

By default operations return the contents of the _source field unless you have used the stored_fields parameter or if the _source field is disabled.

So presumably there's negligible cost to returning the _source field with ES SQL.

There's always the alternative of indexing your own defined "source" field.

I could do this, but I don't want to unless absolutely necessary as the _source field already contains exactly what I want and storing it again is wasteful. Furthermore, I don't want to search or aggregate against this field, simply return it alongside fields from an existing query.

So I think the issue here is a trade off between the ergonomics of the SQL interface vs unnecessary storage. But I'd argue that exposing a meta field or similar enhances rather than detracts from ES SQL.

IMHO I'd not expect to get a JSON back when using the SQL interface but the JSON Query DSL instead but I can understand why you'd like to do this. May be that's a feature request to ask for? @costin WDYT?

Meta fields are somewhat tricky because they act like fields except they aren't. Which means they need to be treated differently and that adds exceptions which ES SQL tries to avoid as much as possible.

Let's take _id - it is available on all docs however you cannot aggregate on it by default since it's not added to the doc values. Which means SELECT _id FROM table would work but SELECT _id FROM index GROUP BY _id will not.

_source goes a step further - typically is json however it can be cbor, smile or yaml - do we return it as a string or as a binary?
Do we return it as a whole or allow easy extraction from it? Further more, since the source is essentially the origin of the document that one queries, why returning in the first place?
I get your requirement that some users might want to get access to the original document but then again, how common is that requirement? In other words, if we add this feature, is this generic enough that enough folks would find it useful?

There's an issue opened around meta fields, in particular _id:

I see this related to your issue as well since we wouldn't want to add a special field for each metadata type but rather a generic mechanism.
One for example would be to add a dedicated function, META which would be allowed only inside projections (SELECT) to essentially expose the underlying field.

SELECT META('_id') FROM.. or potentially enforce that a bit through the grammar:

SELECT META(ID) FROM... though this has the side-effect of complicating the parser which might leads to obscure messages when the same keyword is used by accident in a different context.

Another alternative would be to add dedicated fields : _id however the problem there is it can clash with a user defined field hence the use of a function (which acts as a namespace).

Long story short, it's something that ES SQL should support, it's just that it requires a bit of thinking and we haven't got around to work on it. Feedback is ofc, welcomed.

1 Like

Thanks for the detailed response - these myriad edge cases hadn't occurred to me. I'll simply index the _source field and that will resolve this issue for me indefinitely.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.