I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.
There are many aspects to caching. it stats with the OS file system cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for the
initial query. Are you using facets / sorting (not on score)? Another cache
is the filters cache, are you using filters?
To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...
I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.
No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?
There are many aspects to caching. it stats with the OS file system cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for the
initial query. Are you using facets / sorting (not on score)? Another cache
is the filters cache, are you using filters?
To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...
I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.
You can potentially disable the file system cache, I did it once, on
ubuntu, but you will need to figure out how to do it. What is the query
that you execute?
No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?
There are many aspects to caching. it stats with the OS file system
cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for
the
initial query. Are you using facets / sorting (not on score)? Another
cache
is the filters cache, are you using filters?
To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...
I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.
Thanks for the hint regarding the file system cache; turns out that
was the issue. There's a tool named purge for MacOS X (included in the
developer tools) which wipes all inactive ram, which is where the file
system cache resides. That solves it for me.
You can potentially disable the file system cache, I did it once, on
ubuntu, but you will need to figure out how to do it. What is the query
that you execute?
No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?
There are many aspects to caching. it stats with the OS file system
cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for
the
initial query. Are you using facets / sorting (not on score)? Another
cache
is the filters cache, are you using filters?
To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...
I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.
Hello everyone, i am getting the following errors when using elastic Search to perform query after a certain time on eclipse console:
in the launch confiuguration :
the memory allocation is as follows:
-XX:MaxPermSize=256m
-Xmx800m
..
Error injecting constructor, java.lang.OutOfMemoryError: unable to create new native thread
at org.elasticsearch.threadpool.ThreadPool.(Unknown Source)
while locating org.elasticsearch.threadpool.ThreadPool
can anyone explain me how to solve this issue ?
thanks in advance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.