Increased memory requirements for geo search queries in Elasticsearch 8

After upgrading from Elasticsearch 7.17.10 to Elasticsearch 8.7.1, our geosearch workloads started
to hit memory circuitbreaker leading to the unstable cluster (cluster rejecting requests
which leads to degradation of our services). We had to stop problematic workloads until we found a solution.

We have run deterministic benchmarks on both ES7 and ES8 setups, collecting
Elasticsearch server metrics with the following results.

Workload:

  • two consecutive geo queries from out workload (details in the appended file)
  • run asynchronously in parallel (cca 20 virtual users)

PSA setup (same for ES7 and ES8):

- primary   r6g.large (16 GiB RAM, 2 vCPU, 8 GiB max JVM max heap size)
- secondary r6g.large (16 GiB RAM, 2 vCPU, 8 GiB max JVM max heap size)
- arbiter   t4g.small  (2 GiB RAM, 2 vCPU, 1 GiB max JVM max heap size)

Results for Elasticsearch 7.17.10:

Search rate    cca                   47 ops/s
Search latency cca                  100 ms
JVM memory used cca                 3.6 GB
JVM GC collection run time cca       17 ms

Results for Elasticsearch 8.7.1:

Search rate cca                      80 ops/s
Search latency cca                  220 ms
JVM memory used cca                 7.6 GB
JVM GC collection run time cca     25.4 ms

From metrics it is clear, that Elasticsearch 8 is much faster but has significantly bigger memory requirements.

We would like to know what is the reason for this memory increase and if there is a way to tune memory-speed trade-off directly on the Elasticsearch server.

We can imagine that the increase is connected to new enhancements or upgrades (Java and Lucene) or even a bug in Elasticsearch 8.

Hello,

I am puzzle too about this memory increase as there were no changes as far as I know on the Lucene implementation that can explain it. Would it be possible to share more details on what kind of geo queries you are performing.

Comparisons of the profile of the queries and hot threads would be very useful too.

2 Likes

@Tomas_Bartek

Curious For your tests / benchmarks did you make sure both indices you did the searching against have the exact same mapping AND had been merged down to the exact same number of segments. Otherwise the test may not be as comparative as thought.

1 Like

@stephenb

Hi Stephen, Thank You for your suggestions. I have done same benchmarks once more, now with both indices have same number of segments and against the exact same mapping and the results are same as before (Elasticsearch 8 has significantly higher used JVM heap memory). I will send detailed metrics from these benchmarks.

The best thing would be to have heap dumps for both workloads so they can be compared.

1 Like

Here are metrics from Elasticsearch 7:

Metrics from Elasticsearch 8:

Sure, I will prepare it.

@Ignacio_Vera

I have prepared heap dumps for both workloads, with the following command:
sudo /usr/share/elasticsearch/jdk/bin/jcmd 877 GC.heap_dump /tmp/heap_dump_es7_1.hprof

Elasticsearch7:
heap_dump_es7_1 (dumped before workload)
heap_dump_es7_2 (dumped after start of the workload)
heap_dump_es7_3 (dumped in the middle of the workload)
heap_dump_es7_4 (dumped before the workload finishes)

Elasticsearch8:

heap_dump_es8_2 (dumped after start of the workload)
heap_dump_es8_3 (dumped in the middle of the workload)
heap_dump_es8_4 (dumped before the workload finishes)

Here is a link to files: heap_dump_files

Dump was always done on the master node (node with increased memory in ES8 cluster). I have checked some of the dumps with the MAT tool.

Please, tell me if you need any other (different) dumps or our indexed data.

Are you querying geo_point or geo_shape fields? in case of the later, are they big geometries? and how many documents are your query returning (you never share the workload you are using).

@Ignacio_Vera

Sorry, these two queries are called from the workload for various coordinates:

First query:

POST ruian-scratch,geodata-sk-scratch/_search
{
    "query": {
       "constant_score": {
            "filter": {
                "geo_shape": {
                    "geometry": {
                        "shape": {
                            "type": "point",
                            "coordinates": [49.455864, 15.146864]
                        },
                        "relation": "contains"
                    }
                }
            }
        }
    },
    "_source": ["level", "name"],
    "size": 100
}

Result from the first query is used to fill terms in the second query:

POST ruian-scratch,geodata-sk-scratch/_search
 {

    "query": {"bool": {
    	"filter": [
    		{"term": {"address.neighborhood.keyword": "Lipice"}},
    		{"term": {"address.region.keyword": "Kraj Vysočina"}},
    		{"term": {"address.district.keyword": "Pelhřimov"}},
    		{"term": {"address.country.keyword": "Česká republika"}},
    		{"term": {"address.city.keyword": "Pelhřimov"}},
    		{"term": {"address.cadastral_area.keyword": "Lipice"}},
    		{"term": {"level": "address_point"}}, 
    		{
    			"geo_shape": {
    				"geometry": {
    					"shape": {
    						"type": "circle",
    						"coordinates": [15.146864,49.455864],
    						"radius": "5000m"
    					},
    					"relation": "within"
    				}
    			}
    		}
    		]
    }},
    "_source": ["address"],
    "sort": [{
    	"_geo_distance": {
    		"representative_point": [15.146864,49.455864],
    		"order": "asc",
    		"unit": "m",
    		"mode": "min",
    		"distance_type": "plane"
    	}
    }],
    "size": 1
}

I will send you k6 script for the whole workload.

Thank you @Tomas_Bartek .

The heap dumps you have provided do not show the JVM under heap pressure, they are pretty small so I am confused why you are getting those circuit breaker exceptions. Would you be able to share those exceptions, I am interested to know which circuit breaker is actually triggering.

Just to make sure, data in both clusters is being indexed in the same version, e.g data in the 7.17 does not come from un upgraded index from an earlier version. Could you confirm?

1 Like

Sure @Ignacio_Vera

Under workload, Elasticsearch logs start to contain these two log events, quite often:

{
   "@timestamp":"2023-06-06T16:29:03.294Z",
   "log.level":"INFO",
   "message":"attempting to trigger G1GC due to high heap usage [7859992840]",
   "ecs.version":"1.2.0",
   "service.name":"ES_ECS",
   "event.dataset":"elasticsearch.server",
   "process.thread.name":"elasticsearch[x-elastic2-es8.xx][transport_worker][T#2]",
   "log.logger":"org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService",
   "elasticsearch.cluster.uuid":"B5_WKX7-R8OvOAZwTy3DEg",
   "elasticsearch.node.id":"BAqS8hb9RaqifRMuKYVoKw",
   "elasticsearch.node.name":"x-elastic2-es8.xx",
   "elasticsearch.cluster.name":"x-cluster0-es8"
}
{
   "@timestamp":"2023-06-06T16:29:03.425Z",
   "log.level":"INFO",
   "message":"GC did bring memory usage down, before [7859992840], after [4110292480], allocations [87], duration [131]",
   "ecs.version":"1.2.0",
   "service.name":"ES_ECS",
   "event.dataset":"elasticsearch.server",
   "process.thread.name":"elasticsearch[x-elastic2-es8.xx][transport_worker][T#2]",
   "log.logger":"org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService",
   "elasticsearch.cluster.uuid":"B5_WKX7-R8OvOAZwTy3DEg",
   "elasticsearch.node.id":"BAqS8hb9RaqifRMuKYVoKw",
   "elasticsearch.node.name":"x-elastic2-es8.xx",
   "elasticsearch.cluster.name":"x-cluster0-es8"
}

And sometimes also this log event (cca once per 10min workload):

{
   "@timestamp":"2023-06-06T16:29:09.324Z",
   "log.level":"WARN",
   "message":"failed to retrieve stats for node [zMZlF4RNT9eFWD0x7fQgaA]",
   "ecs.version":"1.2.0",
   "service.name":"ES_ECS",
   "event.dataset":"elasticsearch.server",
   "process.thread.name":"elasticsearch[x-elastic2-es8.xx][generic][T#3]",
   "log.logger":"org.elasticsearch.cluster.InternalClusterInfoService",
   "elasticsearch.cluster.uuid":"B5_WKX7-R8OvOAZwTy3DEg",
   "elasticsearch.node.id":"BAqS8hb9RaqifRMuKYVoKw",
   "elasticsearch.node.name":"x-elastic2-es8.xx",
   "elasticsearch.cluster.name":"x-cluster0-es8",
   "error.type":"org.elasticsearch.transport.RemoteTransportException",
   "error.message":"[x-elastic1-es8.xxx][10.0.57.14:9300][cluster:monitor/nodes/stats[n]]",
   "error.stack_trace":"org.elasticsearch.transport.RemoteTransportException: [x-elastic1-es8.xxx][10.0.57.14:9300][cluster:monitor/nodes/stats[n]]\nCaused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [cluster:monitor/nodes/stats[n]] would be [7859247260/7.3gb], which is larger than the limit of [7857609113/7.3gb], real usage: [7859246184/7.3gb], new bytes reserved: [1076/1kb], usages [eql_sequence=0/0b, model_inference=0/0b, inflight_requests=1076/1kb, request=0/0b, fielddata=0/0b]\n\tat org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:414)\n\tat org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:109)\n\tat org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:215)\n\tat org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:119)\n\tat org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:147)\n\tat org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121)\n\tat org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86)\n\tat org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlan(NioEventLoop.java:689)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.lang.Thread.run(Thread.java:1623)\n"
}

This correlates with these circuitbreaker Prometheus metrics:

@Ignacio_Vera

Regarding data:

Elasticsearch8

  • Data in testing/benchmark Elasticsearch v8.71 cluster were reindexed from the remote production Elasticsearch v8.71 cluster.

  • Data in production Elasticsearch v8.71 cluster were reindexed from the remote Elasticsearch v7.17 (former production) during the upgrade process (ES 7.17 -> ES 8.71).

  • Initially we run tests in testing/benchmark Elasticsearch v8.71 also on data simply recovered from the production v8.71 snapshot without reindexing (with the same results).

Elasticsearch7

  • Indexes in testing/benchmark Elasticsearch v7.17 were recovered from the snapshots of former production Elasticsearch v7.17 backups and locally reindexed to newly created indexes on the same cluster (to have the exact same mappings)

I think it is the real_memory circuit breaker that is actually tripping. You might want to check your JVM settings to see if there is something unusual / different between both deployments but I guess you will be using the default ones.

What I think is happening is that your documents are pretty big, in particular your shapes are big polygons. When you fetch those documents, you are filtering source, for example via "_source": ["level", "name"]. Internally Elasticsearch still needs to read the whole document, uncompressed it and convert it to. a Map in order to fetch the attributes. This is pretty expensive, more over when the document contains massive geojson. You can see that from the hot threads on the 7 deplyment:

	com.fasterxml.jackson.core.util.TextBuffer.contentsAsDouble(TextBuffer.java:533)
	com.fasterxml.jackson.core.base.ParserBase._parseSlowFloat(ParserBase.java:911)
	com.fasterxml.jackson.core.base.ParserBase._parseNumericValue(ParserBase.java:860)
	com.fasterxml.jackson.core.base.ParserBase.getNumberValue(ParserBase.java:609)
	org.elasticsearch.xcontent.json.JsonXContentParser.numberValue(JsonXContentParser.java:172)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readValueUnsafe(AbstractXContentParser.java:389)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readListUnsafe(AbstractXContentParser.java:365)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readValueUnsafe(AbstractXContentParser.java:397)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readListUnsafe(AbstractXContentParser.java:365)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readValueUnsafe(AbstractXContentParser.java:397)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readListUnsafe(AbstractXContentParser.java:365)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readValueUnsafe(AbstractXContentParser.java:397)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readMapEntries(AbstractXContentParser.java:318)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readValueUnsafe(AbstractXContentParser.java:394)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readMapEntries(AbstractXContentParser.java:318)
	org.elasticsearch.xcontent.support.AbstractXContentParser.readMapSafe(AbstractXContentParser.java:304)
	org.elasticsearch.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:254)
	org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:210)
	org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:138)
	org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:106)
	org.elasticsearch.search.lookup.SourceLookup.sourceAsMapAndType(SourceLookup.java:93)
	org.elasticsearch.search.lookup.SourceLookup.source(SourceLookup.java:69)
	org.elasticsearch.search.lookup.SourceLookup.filter(SourceLookup.java:188)
	org.elasticsearch.search.fetch.subphase.FetchSourcePhase$1.hitExecute(FetchSourcePhase.java:70)
	org.elasticsearch.search.fetch.subphase.FetchSourcePhase$1.process(FetchSourcePhase.java:54)
	org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:172)
	org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:93)
	org.elasticsearch.search.SearchService.lambda$executeFetchPhase$7(SearchService.java:793)

This code path is very different in Elasticsearch 8, mainly because the API changed in lucene. Maybe this change has made handling big documents less efficient, I would need to try to reproduce it:

	org.apache.lucene.util.compress.LZ4.decompress(LZ4.java:140)
	org.apache.lucene.codecs.lucene90.LZ4WithPresetDictCompressionMode$LZ4WithPresetDictDecompressor.decompress(LZ4WithPresetDictCompressionMode.java:133)
	org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState$1.fillBuffer(Lucene90CompressingStoredFieldsReader.java:556)
	org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState$1.readBytes(Lucene90CompressingStoredFieldsReader.java:575)
	org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.readField(Lucene90CompressingStoredFieldsReader.java:244)
	org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.document(Lucene90CompressingStoredFieldsReader.java:642)
	org.apache.lucene.index.SegmentReader.document(SegmentReader.java:253)
	org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:404)
	org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:404)
	org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader.document(FieldUsageTrackingDirectoryReader.java:141)
	org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:404)
	org.elasticsearch.index.fieldvisitor.StoredFieldLoader$$Lambda$8282+0x000000080250fb70.accept(Native method)
	org.elasticsearch.index.fieldvisitor.StoredFieldLoader$ReaderStoredFieldLoader.advanceTo(StoredFieldLoader.java:172)
	org.elasticsearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:235)
	org.elasticsearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:204)
	org.elasticsearch.search.fetch.FetchPhase$1.nextDoc(FetchPhase.java:143)
	org.elasticsearch.search.fetch.FetchPhaseDocsIterator.iterate(FetchPhaseDocsIterator.java:70)
	org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:161)
	org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:76)
	org.elasticsearch.search.SearchService.lambda$executeFetchPhase$7(SearchService.java:811)

If my suspicion is right and you have huge documents due to the geometry and you only want to retrieve small part of the documents, you might want to consider storing those fields and get them directly without having to read all _source. More in. the documentation, you just need to replace "A very long content field..." by "A very large geometry..."

2 Likes

@Ignacio_Vera Thank you, I will try a cool idea with the store mapping parameter.

JVM settings:
Yes, we use defaults on both ES7/ES8 setups. I have checked parameters with GET _nodes/jvm and everything looks OK.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.