Search on frozen index cause huge usage of JVM Heap

Ali_Reza_Haghighat · July 31, 2019, 11:07am

Hi,
I have run the following test to analyze the effect of using frozen indices of JVM heap usage.
I have a single node cluster with the following properties:

JVM Heap: 6 GB
Number of indices: 200
Number of shards: 200
Number of the freezed indices: 200 (all indices are freezed!)
Size of each shard: 3 GB
Used JVM Heap: 3.1 GB

I run a simple query on the Elastic:

{ "query": { "match_all": {} } }

Running this query results in huge usage of JVM heap (reaching to 5.99 GB) for 3-4 minutes. Finally, Elastic was able to return the result, however, based on the Elastic documentation, I expect much less usage of JVM heap:

Elastic Document on Frozen Indices

Elasticsearch builds the transient data structures of each shard of a frozen index each time that shard is searched, and discards these data structures as soon as the search is complete. Because Elasticsearch does not maintain these transient data structures in memory, frozen indices consume much less heap than normal indices

Frozen Index Issue on Github:

When a search targets frozen indices, each index would be searched sequentially (instead of in parallel), and each index would be loaded, searched, then unloaded again.

Thanks you in advance!

danielmitterdorfer · August 1, 2019, 12:58pm

Hi,

thanks for sharing this here. Can you please explain your measurement approach in a little bit more detail? Specifically, I'd be interested in:

How did you measure shard size?
Are these indices force-merged? Did you force-merge them to a single segment?
What do you mean by the metric "Used JVM Heap" that you present at the beginning of your post?
How did you measure JVM heap while running the query?
How many indices did you query? All of them or only a subset?

The documentation talks about memory usage relative to regular indices so I suggest to base heap usage expectations not on absolute numbers but instead on comparisons to memory usage with regular indices vs. with frozen indices.

Daniel

Ali_Reza_Haghighat · August 3, 2019, 1:46pm

Hi Daniel,

Thanks for your reply.

I have used /_cat/indices API to measure shard size.
No, I have not run the _forcemerge.
I wrote a python script to measure the used JVM heap by requesting /_nodes/stats/jvm and reading the "jvm->mem->heap_used_in_bytes" field.
My script reads the JVM heap periodically (e.g., every 5 seconds).
I run {"query" : {"match_all": {}}} query on all indices.

I will run some tests to compare the ratio of memory usage between the regular indices and frozen indices. But based on my old experiments, I think the Elasticsearch requires only 8 GB to handle a similar query on the same condition.

Christian_Dahlqvist · August 3, 2019, 1:53pm

You should forcemerge down to 1segment before you freeze indices. Try and see what difference it makes.

Ali_Reza_Haghighat · August 3, 2019, 2:02pm

Thanks Christian,

I tried to run force_merge, but I wasn't successful.
I run this command on an index with 1 shard and 23 segments:

curl -X POST 'http://localhost:9200/geonames2-0/_forcemerge?max_num_segments=1'

I get the following result:

{"_shards":{"total":1,"successful":1,"failed":0}}

However, the index has still 23 segments (even after 2 hours).

Ali_Reza_Haghighat · August 3, 2019, 2:31pm

Excuse me, it was my fault!!!

I had run the _forcemerge query on a freezed index. After unfreezing the index, my _forcemerge request is now being processed. I can also see it using the /_tasks API.

I will try to forcemerge all of my indices and then retry my experiment.

BTW, it would be better if running a _forcemerge query on a freezed index returns a warning or error!

Ali_Reza_Haghighat · August 4, 2019, 12:49pm

The _forcemerge operation decrease JVM heap usage of my indices dramatically (even when they are not freezed).

Here we have a rule of thumb for computing required JVM heap based on the number of shards:

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured

I just wonder whether there is a similar rule of thumb for forcemerged shards? In other words, I think for forcemerged indices we can have for example 50 shards per GB heap.

Christian_Dahlqvist · August 4, 2019, 4:40pm

It is cluster state size as well as heap usage that drives this recommendation, and it assumes best practices are being followed, which includes forcemerging indices that are no longer written to down to a single segment.

Ali_Reza_Haghighat · August 5, 2019, 2:17pm

Thanks Christian!

system · September 2, 2019, 2:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JVM heap size usage and causes Elasticsearch	9	1939	September 25, 2019
Jvm Heap Size & Indexing Perfmance Problem Elasticsearch	1	461	March 11, 2020
JVM > 90% - Small indexes , High Shards Elasticsearch	6	955	July 5, 2017
Elasticsearch Cluster turns Red - Is JVM Heap main culprit? Elasticsearch	8	497	March 28, 2019
Elasticsearch heap usage increase after switching from Oracle Java 8 to OpenJDK11 Elasticsearch	1	397	November 21, 2019

Search on frozen index cause huge usage of JVM Heap

Related topics