Clear cache for performance tests

Hi!

I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.

I've tried doing the following things before re-running a query (yes,
ES is running on port 4002, and the index is called my_river):
curl -XPOST 'http://localhost:4002/my_river/_cache/clear'
curl -XPOST 'http://localhost:4002/my_river/_flush'

What else do I need to do to eliminate this caching behavior?

Thanks!

Felix

There are many aspects to caching. it stats with the OS file system cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for the
initial query. Are you using facets / sorting (not on score)? Another cache
is the filters cache, are you using filters?

To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...

On Wed, Nov 2, 2011 at 1:34 PM, Felix Sappelt f.sappelt@clueda.com wrote:

Hi!

I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.

I've tried doing the following things before re-running a query (yes,
ES is running on port 4002, and the index is called my_river):
curl -XPOST 'http://localhost:4002/my_river/_cache/clear'
curl -XPOST 'http://localhost:4002/my_river/_flush'

What else do I need to do to eliminate this caching behavior?

Thanks!

Felix

No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?

Thanks!

Felix

On Nov 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

There are many aspects to caching. it stats with the OS file system cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for the
initial query. Are you using facets / sorting (not on score)? Another cache
is the filters cache, are you using filters?

To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...

On Wed, Nov 2, 2011 at 1:34 PM, Felix Sappelt f.sapp...@clueda.com wrote:

Hi!

I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.

I've tried doing the following things before re-running a query (yes,
ES is running on port 4002, and the index is called my_river):
curl -XPOST 'http://localhost:4002/my_river/_cache/clear'
curl -XPOST 'http://localhost:4002/my_river/_flush'

What else do I need to do to eliminate this caching behavior?

Thanks!

Felix

You can potentially disable the file system cache, I did it once, on
ubuntu, but you will need to figure out how to do it. What is the query
that you execute?

On Thu, Nov 3, 2011 at 12:46 PM, Felix Sappelt f.sappelt@clueda.com wrote:

No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?

Thanks!

Felix

On Nov 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

There are many aspects to caching. it stats with the OS file system
cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for
the
initial query. Are you using facets / sorting (not on score)? Another
cache
is the filters cache, are you using filters?

To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...

On Wed, Nov 2, 2011 at 1:34 PM, Felix Sappelt f.sapp...@clueda.com
wrote:

Hi!

I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.

I've tried doing the following things before re-running a query (yes,
ES is running on port 4002, and the index is called my_river):
curl -XPOST 'http://localhost:4002/my_river/_cache/clear'
curl -XPOST 'http://localhost:4002/my_river/_flush'

What else do I need to do to eliminate this caching behavior?

Thanks!

Felix

Thanks for the hint regarding the file system cache; turns out that
was the issue. There's a tool named purge for MacOS X (included in the
developer tools) which wipes all inactive ram, which is where the file
system cache resides. That solves it for me.

Regards,
Felix

On Nov 3, 6:33 pm, Shay Banon kim...@gmail.com wrote:

You can potentially disable the file system cache, I did it once, on
ubuntu, but you will need to figure out how to do it. What is the query
that you execute?

On Thu, Nov 3, 2011 at 12:46 PM, Felix Sappelt f.sapp...@clueda.com wrote:

No sorting, filters and facets for now (although I plan to use them
later on); I've tried the curl command that you mentioned, but I'm
still having, say, 1000ms when I first run the query and 120ms when i
run it again later. Most of the searches on our system will probably
run just once, ever, and we provide our users with the possibility to
combine queries, so that knowing the point at which queries become too
complex to run in acceptable time is important. Currently I'm trying
to make up new queries every time i test something, but this is takes
around 10 minutes for every single test run and probably is not a good
idea with regard to comparing the results... anything that can be done
about this? Any way to prevent ES from saving the cache to disk (so
that I can just restart it to get rid of the cache) or something like
that?

Thanks!

Felix

On Nov 2, 7:42 pm, Shay Banon kim...@gmail.com wrote:

There are many aspects to caching. it stats with the OS file system
cache.
The main other cache that is expensive is the one used for sorting /
facets, which you can clear, but those can take a long time to load for
the
initial query. Are you using facets / sorting (not on score)? Another
cache
is the filters cache, are you using filters?

To clear those, across all indices, you can: curl -XPOST
localhost/_cache/clear. What you do, is clear the cache for the _river
index, thats not the actual index you have and query...

On Wed, Nov 2, 2011 at 1:34 PM, Felix Sappelt f.sapp...@clueda.com
wrote:

Hi!

I'm trying to do performance tests on various queries we're using; I'm
interested in the worst-case performance, that is, how long it takes
to run the query for the first time without any caching etc. (I
already know that cached queries are acceptably fast in all cases).
What I can't figure out is how to achieve this; when I run a query for
the second time, it is blazingly fast, even if it takes seconds to
finish on the initial attempt. It appears that ES caches it somewhere,
but I've tried clearing caches and flushing, to no avail.

I've tried doing the following things before re-running a query (yes,
ES is running on port 4002, and the index is called my_river):
curl -XPOST 'http://localhost:4002/my_river/_cache/clear'
curl -XPOST 'http://localhost:4002/my_river/_flush'

What else do I need to do to eliminate this caching behavior?

Thanks!

Felix

Hello everyone, i am getting the following errors when using elastic Search to perform query after a certain time on eclipse console:
in the launch confiuguration :
the memory allocation is as follows:
-XX:MaxPermSize=256m
-Xmx800m

..
Error injecting constructor, java.lang.OutOfMemoryError: unable to create new native thread
at org.elasticsearch.threadpool.ThreadPool.(Unknown Source)
while locating org.elasticsearch.threadpool.ThreadPool

can anyone explain me how to solve this issue ?
thanks in advance.