Memory utilization - predicting 'out of heap space' errors


(Yooz) #1

Hello,

We are running ES 0.17.7 on 4 16GB boxes. 14GB of memory is locked
(dedicated) for ES. There are several indices, the largest ones range from
50GB to 200GB of data @ 50 - 125 million documents. Currently there is no
issue with memory, but the data size is continually growing. Because the
memory is locked, it is hard to tell from the OS level how much memory ES
actually needs. Is there a parameter exposed in the status API or a rule of
thumb based on the data that shows if ES is running close to the limit?

I.e. because ES scales so well, we want to add more capacity ahead of time
in order to avoid errors due to memory issues.

Also, Shay has mentioned in several posts that slicing the data up by date
indices (daily, weekly), should minimize the amount of memory used by the
Lucene indices. Is this optimization driven by the use-case? I.e. users
would be most interested in the recent data and would query the past N days
rather than search the entire index? What happens if they consistently
query the entire date range, would this slicing scheme become inefficient
compared to having one massive index?

Thanks!
--young


(Ævar Arnfjörð Bjarmason) #2

Try Bigdesk, it gives you a graphical view of allocated / used memory.
On Oct 9, 2011 7:59 PM, "Yooz" youngmaeng@gmail.com wrote:

Hello,

We are running ES 0.17.7 on 4 16GB boxes. 14GB of memory is locked
(dedicated) for ES. There are several indices, the largest ones range from
50GB to 200GB of data @ 50 - 125 million documents. Currently there is no
issue with memory, but the data size is continually growing. Because the
memory is locked, it is hard to tell from the OS level how much memory ES
actually needs. Is there a parameter exposed in the status API or a rule of
thumb based on the data that shows if ES is running close to the limit?

I.e. because ES scales so well, we want to add more capacity ahead of time
in order to avoid errors due to memory issues.

Also, Shay has mentioned in several posts that slicing the data up by date
indices (daily, weekly), should minimize the amount of memory used by the
Lucene indices. Is this optimization driven by the use-case? I.e. users
would be most interested in the recent data and would query the past N days
rather than search the entire index? What happens if they consistently
query the entire date range, would this slicing scheme become inefficient
compared to having one massive index?

Thanks!
--young


(Yooz) #3

Thanks! Will try it out.

I've noticed search performance on some of the larger indices take 8-10
seconds. For example on an index with 160 million documents @ 220gb size
(not counting replication) w/15 shards, a text phrase search on a single
field consistently took ~8500 milliseconds (changing the actual phrase to
prevent caching results). There are no other clients on the cluster. The
search query is a single phrase query with a range filter on a field called
postdate (date of the document). No facets, no sorting. The query type is
query_and_fetch.

Is this expected performance? Is there a way to decrease the latency down
to sub-second response times for an index of this size?

Thanks again,
--young


(Clinton Gormley) #4

I've noticed search performance on some of the larger indices take
8-10 seconds. For example on an index with 160 million documents @
220gb size (not counting replication) w/15 shards, a text phrase
search on a single field consistently took ~8500 milliseconds
(changing the actual phrase to prevent caching results). There are no
other clients on the cluster. The search query is a single phrase
query with a range filter on a field called postdate (date of the
document). No facets, no sorting. The query type is query_and_fetch.

Try using a numeric_range filter for postdate instead. numeric_range is
good for fields with many many unique terms (such as a datetime would
have).

clint


(Yooz) #5

Great suggestion Clint! This small change dropped down the average response
times by more than half. It takes between 2500-4000 milliseconds per query
(again with no facets, sorting) from ~8500 milliseconds.

When turning on 3 termsFacet fields (2 with cardinality in the 10's, 1 with
cardinality in 100's, both types limited to 20 top results), and a
dateHistogram facet with the interval set to "week", the performance becomes
a bit more unpredictable. Response times range from 2000-10000
milliseconds. I'm not sure what the variable here is (maybe the amount of
results returned?) Monitoring on BigDesk seems to indicate that resources
are sufficient (only small cpu spikes during the search query execution, no
memory usage flux). Is it reasonable to expect sub-second query responses
with large indices? 2-3 second return time is acceptable in our use case
but it would be nice to know what the bottleneck is.

Cheers!
--young


(Shay Banon) #6

Can you gist the full search request you make? With all the parameters. Most
times, query_then_fetch is better then query_and_fetch.

On Tue, Oct 11, 2011 at 2:54 AM, Yooz youngmaeng@gmail.com wrote:

Great suggestion Clint! This small change dropped down the average
response times by more than half. It takes between 2500-4000 milliseconds
per query (again with no facets, sorting) from ~8500 milliseconds.

When turning on 3 termsFacet fields (2 with cardinality in the 10's, 1 with
cardinality in 100's, both types limited to 20 top results), and a
dateHistogram facet with the interval set to "week", the performance becomes
a bit more unpredictable. Response times range from 2000-10000
milliseconds. I'm not sure what the variable here is (maybe the amount of
results returned?) Monitoring on BigDesk seems to indicate that resources
are sufficient (only small cpu spikes during the search query execution, no
memory usage flux). Is it reasonable to expect sub-second query responses
with large indices? 2-3 second return time is acceptable in our use case
but it would be nice to know what the bottleneck is.

Cheers!
--young


(Yooz) #7

Here is the gist generated directly from the java client:

As the queries ramped up, the heap usage shot up (via bigdesk) from 3-4 GB
per box to 12-14GB (14 is max). The filter cache settings are at default.


(Yooz) #8

I think I understand what the issue is with my setup. What I didn't mention
was that I have 15 indices, each with a HUGE difference in the amount of
documents they index. E.g. One index consists of 1000 documents, while
another consists of 180 million documents. Because elasticsearch rebalances
the cluster solely based on the shards per node, there is no consideration
given to the amount of data that shard holds. Consequently, I've been
getting the larger index shards allocated to the same node resulting in
memory issues for that particular instance. In some cases this causes an
out of memory error and brings that node to a wedged state (doesn't show up
in es monitoring, logs continually spit out 'out of memory') and the data
proceeds to be rebalanced to the other nodes. Unfortunately this results in
a cascading failure because the same problem happens when the same load hits
the next unlucky node.

Are there any plans to change the rebalancing scheme in the near future? If
not, could I get a few entry points in the source on where I would patch the
code to make index size aware rebalancing work?

Thanks!
--young


(Yooz) #9

Also, would there be any issues scaling up to 10,000's of indices (most of
them very small, on the order of 1,000 documents, but a few monolithic ones
at 300 million documents)?

Thanks again!
--young


(ppearcy) #10

I'd recommend more RAM for the OS disk caches. I normally run a 50/50
ES to OS RAM config. It's easy enough to tweak and test. We do this
with 16GB servers and 48GB servers. Also, keep in mind initial queries
aren't blazing fast, as disk caches and internal ES caches need to get
built up.

Typically, that many indices is not recommended. Better to have a
larger index with lots of shards and use the routing functionality.
Only testing will tell how things perform with your setup and data,
though :slight_smile:

Best Regards,
Paul

On Oct 13, 4:10 pm, Yooz youngma...@gmail.com wrote:

Also, would there be any issues scaling up to 10,000's of indices (most of
them very small, on the order of 1,000 documents, but a few monolithic ones
at 300 million documents)?

Thanks again!
--young


(Shay Banon) #11

Heya, yea, the balancing scheme elasticsearch uses is not perfect for this
type of scenario. It was purposed to add one based on index size / document
count, but its not there yet (though, a lot of work has going into building
the basis for it in the future, which also allowed to do awareness
allocation).

Its not a simple feature, though, I might implement a simpler feature for
now (requires some thinking) of allowing to say that an index should be
forced to rebalance on its own, without taking into account the rest of the
indices.

On Fri, Oct 14, 2011 at 12:00 AM, Yooz youngmaeng@gmail.com wrote:

I think I understand what the issue is with my setup. What I didn't
mention was that I have 15 indices, each with a HUGE difference in the
amount of documents they index. E.g. One index consists of 1000 documents,
while another consists of 180 million documents. Because elasticsearch
rebalances the cluster solely based on the shards per node, there is no
consideration given to the amount of data that shard holds. Consequently,
I've been getting the larger index shards allocated to the same node
resulting in memory issues for that particular instance. In some cases this
causes an out of memory error and brings that node to a wedged state
(doesn't show up in es monitoring, logs continually spit out 'out of
memory') and the data proceeds to be rebalanced to the other nodes.
Unfortunately this results in a cascading failure because the same problem
happens when the same load hits the next unlucky node.

Are there any plans to change the rebalancing scheme in the near future?
If not, could I get a few entry points in the source on where I would patch
the code to make index size aware rebalancing work?

Thanks!
--young


(Shay Banon) #12

10k indices might be too many for a smaller set of nodes. Each index (well,
actually shard, but even with 1 shard per index, thats 10k shards) is a
Lucene indices, which requires resources. As was suggested, I recommend
using larger amount of shards, and using routing instead.

On Fri, Oct 14, 2011 at 12:10 AM, Yooz youngmaeng@gmail.com wrote:

Also, would there be any issues scaling up to 10,000's of indices (most of
them very small, on the order of 1,000 documents, but a few monolithic ones
at 300 million documents)?

Thanks again!
--young


(Yooz) #13

@Paul: Thanks for the tip! I didn't think about the OS level caches and just
dedicated all the memory to the ES process.

@Shay: Thanks for the clarification and suggestions!


(system) #14