Cache _source in memory

sumit_jain · January 5, 2016, 9:37am

I am using _source in my sort script to sort docs based on complex logic involving multi level nested fields. But accessing _source from disk for each matched document is performing poorly, giving me ~ 3 sec response times.

Is there any way to improve this by caching the _source field in memory? I am willing to pay for more memory, in order to improve the latency.

nik9000 · January 5, 2016, 12:57pm

Not that I know of. The source is likely being pulled into memory by the
OS's page cache but you still have to pay the overhead of getting it into
the heap as objects every time.

sumit_jain · January 6, 2016, 7:24pm

Why do you think its likely to come from file system cache, does it work like doc value or just because its accessed frequently?

nik9000 · January 6, 2016, 7:41pm

Just because its accessed frequently. Doc values are the same way but column oriented rather than a serialized blob. They can take up much less space on disk and you tend to access them in order so they are much more friendly to the disk cache than the way source is stored.