The elastic search setup consists of only a single node and the index
consists of only a single shard and 0 replicas. The version of
elasticsearch is 0.90.7
The fields mentioned above is only the first 8 fields. The actual term
filter that we execute has 350 fields mentioned.
We noticed the memory spiking by about 2-3gb though the store size is only
504mb.
Running the query multiple times seems to continuously increase the
memory.
Could someone explain why this memory spike occurs?
This is something that I just "discovered" as well.
Using a top-level filter is really a "post_filter" (it's renamed in later
versions of ES):
So, this will execute the query first (a default "match_all: {}") and then
execute the filter on that result set. This is not very efficient for your
query, since I expect you expected having filter there to work like a
"pre-filter" and filter out results before executing the query.
To do that, you need to use a "filtered query":
In your case, the resulting query would look like:
The Elasticsearch setup consists of only a single node and the
index consists of only a single shard and 0 replicas. The version of
elasticsearch is 0.90.7
The fields mentioned above is only the first 8 fields. The actual
term filter that we execute has 350 fields mentioned.
We noticed the memory spiking by about 2-3gb though the store size is
only 504mb.
Running the query multiple times seems to continuously increase the
memory.
Could someone explain why this memory spike occurs?
So does that mean all the documents in the shard (since the index has only
1 shard) are pulled into memory and then the filter is applied?
By moving it to a post-filter, I see the response coming back in around 15s
than previously where it was taking more than a minute. However the memory
still increases around 2-3GB.
Re-running the filtered query again multiple times does not further
increase the memory.
Though the index store mentions only 504mb could you explain why the memory
spikes to 2-3GB even with a filtered-query?
With the filtered query approach does the filtering happen at the disk
level?
Could you also explain why I don't see the memory increasing further with
multiple runs of the filtered-query?
On Friday, November 21, 2014 6:15:27 PM UTC+5:30, Nick Canzoneri wrote:
This is something that I just "discovered" as well.
So, this will execute the query first (a default "match_all: {}") and then
execute the filter on that result set. This is not very efficient for your
query, since I expect you expected having filter there to work like a
"pre-filter" and filter out results before executing the query.
The Elasticsearch setup consists of only a single node and the
index consists of only a single shard and 0 replicas. The version of
elasticsearch is 0.90.7
The fields mentioned above is only the first 8 fields. The actual
term filter that we execute has 350 fields mentioned.
We noticed the memory spiking by about 2-3gb though the store size is
only 504mb.
Running the query multiple times seems to continuously increase the
memory.
Could someone explain why this memory spike occurs?
As Nick pointed out, this query is going to match documents on a doc-by-doc
basis, which is going to be very slow.
However, each iteration is not supposed to increase memory usage. Memory
usage might jump on the first request because elasticsearch will need to
load the date_value field into field data and potentially your term
filter into the filter cache, but that should be it, and subsequent
executions of this request should not add 2GB of garbage.
Something that is uncommonly high in your query is the size parameter. Do
you have an idea of how large your documents are? It could be that part of
the reason why so much garbage is generated is due to the building and then
serialization of the search response.
The Elasticsearch setup consists of only a single node and the
index consists of only a single shard and 0 replicas. The version of
elasticsearch is 0.90.7
The fields mentioned above is only the first 8 fields. The actual
term filter that we execute has 350 fields mentioned.
We noticed the memory spiking by about 2-3gb though the store size is
only 504mb.
Running the query multiple times seems to continuously increase the
memory.
Could someone explain why this memory spike occurs?
One difference is that the store is compressed while the structure of the
response that is built into memory is not (and potentially very wasteful).
The doc-by-doc comparison is done by the post-filter: for every document
that matches the query, the filter is evaluated in order to know whether it
matches or not. On the other hand, when a filter is in the query (either
under a constant_score or a filtered_query), it can efficiently jump to the
next matches using the inverted index.
I upgraded to ES 0.90.13. What I'm now noticing is that the memory seems
to continuously increase when running the query again and again.
I also noticed that after upgrading it started using OpenJDK 1.6_0.33. So I
switched back to using Oracle JDK 1.7.71 however the issue seems to persist.
On Friday, November 21, 2014 10:57:09 PM UTC+5:30, Adrien Grand wrote:
One difference is that the store is compressed while the structure of the
response that is built into memory is not (and potentially very wasteful).
The doc-by-doc comparison is done by the post-filter: for every document
that matches the query, the filter is evaluated in order to know whether it
matches or not. On the other hand, when a filter is in the query (either
under a constant_score or a filtered_query), it can efficiently jump to the
next matches using the inverted index.
Seeing memory continuously increasing over a couple of requests is not
necessarily a bad sign. If you give X gigabytes of memory to a JVM, it
won't hesitate to use them if it can help decrease the frequency at which
it has to run costly garbage collections. What is more important to watch
is how memory usage behaves over a long period, eg. does the frequency at
which GCs run keep on increasing (which could indicate that the server is
encountering memory pressure or that there is something that leaks memory
somewhere).
I upgraded to ES 0.90.13. What I'm now noticing is that the memory seems
to continuously increase when running the query again and again.
I also noticed that after upgrading it started using OpenJDK 1.6_0.33. So
I switched back to using Oracle JDK 1.7.71 however the issue seems to
persist.
On Friday, November 21, 2014 10:57:09 PM UTC+5:30, Adrien Grand wrote:
One difference is that the store is compressed while the structure of the
response that is built into memory is not (and potentially very wasteful).
The doc-by-doc comparison is done by the post-filter: for every document
that matches the query, the filter is evaluated in order to know whether it
matches or not. On the other hand, when a filter is in the query (either
under a constant_score or a filtered_query), it can efficiently jump to the
next matches using the inverted index.
If you want to export your index, a more efficient way would be to use
the scan search type: http://www.elasticsearch.org/
guide/en/elasticsearch/guide/current/scan-scroll.html. It basically
opens a cursor that you can iterate on, as opposed as trying to get
everything at once.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.