We might be able to boost _source
's performance in search scripts but I don't think we have much in the way of appetite for adding any complexity there. There are other options that we recommend, mostly doc values like you've been using.
Under the hood _source
is stored in Lucene "stored field". These fields are stored in a way to optimize two things:
- Storage space
- Fetching all the fields are once
They are stored by taking the stored fields from a few documents and sticking them together in a chunk and then compressing that chunk. That means that when you load _source
we have to decompress enough of the chunk that we can get the entire _source
for your document read. So we might have to decompress more documents if they are stored in the same chunk.
Then we have to deserialize the _source
from whatever format it is stored in, converting it into a Java Map
to pass to the script. All to get the one field.
Theoretically there is a lot we could do to make _source
faster in search scripts but doc values are already stored in a much more sensible way for this kind of thing. They are stored column-wise so it is much faster to get the value for a single document. So even if we worked hard to save time on deserialization we couldn't really beat the whole chunking problem.
And yes, 17x performance difference is totally reasonable. I only use _source
in search scripts where I don't care how long they take.