Groovy vs. Painless performance difference

We might be able to boost _source's performance in search scripts but I don't think we have much in the way of appetite for adding any complexity there. There are other options that we recommend, mostly doc values like you've been using.

Under the hood _source is stored in Lucene "stored field". These fields are stored in a way to optimize two things:

  1. Storage space
  2. Fetching all the fields are once

They are stored by taking the stored fields from a few documents and sticking them together in a chunk and then compressing that chunk. That means that when you load _source we have to decompress enough of the chunk that we can get the entire _source for your document read. So we might have to decompress more documents if they are stored in the same chunk.

Then we have to deserialize the _source from whatever format it is stored in, converting it into a Java Map to pass to the script. All to get the one field.

Theoretically there is a lot we could do to make _source faster in search scripts but doc values are already stored in a much more sensible way for this kind of thing. They are stored column-wise so it is much faster to get the value for a single document. So even if we worked hard to save time on deserialization we couldn't really beat the whole chunking problem.

And yes, 17x performance difference is totally reasonable. I only use _source in search scripts where I don't care how long they take.

2 Likes