I am using _source in my sort script to sort docs based on complex logic involving multi level nested fields. But accessing _source from disk for each matched document is performing poorly, giving me ~ 3 sec response times.
Is there any way to improve this by caching the _source field in memory? I am willing to pay for more memory, in order to improve the latency.
Not that I know of. The source is likely being pulled into memory by the
OS's page cache but you still have to pay the overhead of getting it into
the heap as objects every time.
Just because its accessed frequently. Doc values are the same way but column oriented rather than a serialized blob. They can take up much less space on disk and you tend to access them in order so they are much more friendly to the disk cache than the way source is stored.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.