i am testing the performance of my ES cluster. I am indexing artificial documents in high rate and then i perform searches in them. I do full text search (from a client programm). The results are really good!
What i need to know is in what measure the cache is responsible for these results?
Is the answer of the result cashed after the first search? In marvel dashboard i don't see any increase in the graphs that are familiar with cache performance.
I have read a lot but i didn't manage to figure out what exactly is rigth.
I am already familiar with filters and bitsets. The search i used in my tests is full-text search with no filters so bitsets doesn't participate in this case.
Unless you are using sorting, aggregations or nested/parent-child docs in your queries the field data cache will not be used. Elasticsearch (and Lucene) heavily use the OS file system cache to cache index files and avoid going to disk.
When you first run a search the segments being read off the disk will be loaded into the OS file system cache automatically by the OS so that subsequent accesses to that file can be done in memory rather than having to go back to the disk. Because Lucene 'never modifies a file' this works very well with the file system cache and means that the file never needs to be invalidated in the cache. You won't be able to turn this caching off as it's done automatically by the OS.
Could you explain why you are wanting to disable caching? It's built in to improve performance.
Of course cache is bulit to improve performance but i want to experiment on how the search will go without having the cache enabled, and compare response times for example.
Anyway, thanx for the explanation!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.