High CPU load when using vector tile search API

Hi everyone,
We use Elasticsearch mainly for the vector tile search API and we are trying to figure out what could be the best way to configure it for better performance.
We have a CPU optimized deployment on Elastic Cloud with 720Gb | 16Go RAM | 8 CPU configuration.
Clearly the 720Gb is too much for us (We use about 34Go) but when we have concurrent connections using the vector tile search API we see quickly the CPU going high and request that usually take few milliseconds take a lot of time.
Here is some metrics I get when this is happening:




During these times, we see also some overhead issue in logs:

[elasticsearch.server][INFO] {"type": "server", "timestamp": "2022-06-28T09:32:04,919Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "4603b8c33b92410fbc21301d4b19091b", "node.name": "instance-0000000000", "message": "[gc][1794267] overhead, spent [280ms] collecting in the last [1s]", "cluster.uuid": "LWOpg7n-SwSwav-tiM3rFQ", "node.id": "UursvdJ2QXu0UzYJhGl4QQ"  }

I'm guessing that creating the tiles is taking a lot of CPU but I don't know if there a way to adapt the configuration for when we use ES mainly for the vector tile search API.

Any tips ? Maybe @Ignacio_Vera ?

Anyway thanks a lot for this new feature, it's a game changer for us.

All the best,

Thibault

Hey @thibaultclem,

Could you provide the hot threads when running the vector tile search API concurrently, let's see where it is expending most of the time. In addition, what is the layout of your index, e.g number of shards?

Hi @Ignacio_Vera,
Thanks for your comments !
Here is the hot threads:

::: {instance-0000000000}{UursvdJ2QXu0UzYJhGl4QQ}{JwdJU42xQ7OAEzOxvO2sog}{instance-0000000000}{10.43.255.165}{10.43.255.165:19511}{himrst}{xpack.installed=true, data=hot, server_name=instance-0000000000.4603b8c33b92410fbc21301d4b19091b, instance_configuration=gcp.es.datahot.n2.68x32x45, region=unknown-region, availability_zone=europe-west1-c, logical_availability_zone=zone-0}
   Hot threads at 2022-06-29T07:54:15.976Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.0% [cpu=94.7%, other=5.3%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000000][search][T#11]'
     4/10 snapshots sharing following 30 elements
       app/org.elasticsearch.xcontent@8.3.0/org.elasticsearch.xcontent.support.AbstractXContentParser.readMapSafe(AbstractXContentParser.java:312)
       app/org.elasticsearch.xcontent@8.3.0/org.elasticsearch.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:263)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:270)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:177)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:140)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:132)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.lookup.SourceLookup.sourceAsMapAndType(SourceLookup.java:93)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.lookup.SourceLookup.source(SourceLookup.java:69)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.lookup.SourceLookup.extractValue(SourceLookup.java:208)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.index.mapper.SourceValueFetcher.fetchValues(SourceValueFetcher.java:58)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.index.mapper.ValueFetcher.fetchDocumentField(ValueFetcher.java:55)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.subphase.FieldFetcher.fetch(FieldFetcher.java:170)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.subphase.FetchFieldsPhase$1.process(FetchFieldsPhase.java:48)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:175)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:93)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:659)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:634)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:489)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7242/0x0000000801f19740.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7243/0x0000000801f19950.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$$Lambda$6403/0x0000000801d8c4c0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:768)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@18.0.1.1/java.lang.Thread.run(Thread.java:833)
     3/10 snapshots sharing following 30 elements
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.LZ4WithPresetDictCompressionMode$LZ4WithPresetDictDecompressor.decompress(LZ4WithPresetDictCompressionMode.java:132)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState.document(Lucene90CompressingStoredFieldsReader.java:595)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.document(Lucene90CompressingStoredFieldsReader.java:610)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.visitDocument(Lucene90CompressingStoredFieldsReader.java:628)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.CodecReader.document(CodecReader.java:89)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader.document(FieldUsageTrackingDirectoryReader.java:123)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase$$Lambda$7357/0x0000000801f3e0f0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:468)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:323)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:277)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:163)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:93)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:659)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:634)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:489)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7242/0x0000000801f19740.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7243/0x0000000801f19950.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$$Lambda$6403/0x0000000801d8c4c0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:768)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@18.0.1.1/java.lang.Thread.run(Thread.java:833)
     2/10 snapshots sharing following 17 elements
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:175)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:93)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:659)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:634)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:489)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7242/0x0000000801f19740.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7243/0x0000000801f19950.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$$Lambda$6403/0x0000000801d8c4c0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:768)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@18.0.1.1/java.lang.Thread.run(Thread.java:833)
     unique snapshot
       java.base@18.0.1.1/sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
       java.base@18.0.1.1/sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:54)
       java.base@18.0.1.1/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:338)
       java.base@18.0.1.1/sun.nio.ch.IOUtil.read(IOUtil.java:306)
       java.base@18.0.1.1/sun.nio.ch.IOUtil.read(IOUtil.java:283)
       java.base@18.0.1.1/sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:839)
       java.base@18.0.1.1/sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:824)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:291)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:55)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.store.DataInput.readVInt(DataInput.java:121)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:183)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState.doReset(Lucene90CompressingStoredFieldsReader.java:439)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState.reset(Lucene90CompressingStoredFieldsReader.java:424)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.document(Lucene90CompressingStoredFieldsReader.java:607)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader.visitDocument(Lucene90CompressingStoredFieldsReader.java:628)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.CodecReader.document(CodecReader.java:89)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader.document(FieldUsageTrackingDirectoryReader.java:123)
       app/org.apache.lucene.core@9.2.0/org.apache.lucene.index.FilterLeafReader.document(FilterLeafReader.java:381)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase$$Lambda$7357/0x0000000801f3e0f0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:468)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.prepareNonNestedHitContext(FetchPhase.java:323)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:277)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.buildSearchHits(FetchPhase.java:163)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:93)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:659)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:634)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:489)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7242/0x0000000801f19740.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.search.SearchService$$Lambda$7243/0x0000000801f19950.get(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$$Lambda$6403/0x0000000801d8c4c0.accept(Unknown Source)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:768)
       app/org.elasticsearch.server@8.3.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@18.0.1.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@18.0.1.1/java.lang.Thread.run(Thread.java:833)

Here is the settings of the index that get "timeout":

{
  "land-v2": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "blocks": {
          "read_only_allow_delete": "false"
        },
        "provided_name": "land-v2",
        "creation_date": "1651152476726",
        "number_of_replicas": "0",
        "uuid": "Fyfg9wYGSFmG_gY0kqf4WQ",
        "version": {
          "created": "8010399"
        }
      }
    }
  }
}

On disk this index take about 10Go:

green  open land-v2                                       Fyfg9wYGSFmG_gY0kqf4WQ 1 0 16135142 3791001  10.4gb  10.4gb

Let me know if you need anything else from my side.

Thanks a lot

First, I would like to give three tips to improve the performance of the vector tiles search API, some might not be relevant for you but still good to write them down:

  1. Don’t generate unnecessary layers: If you are only working with the Hits layer, make sure to set the grid_precision to 0. Likewise, if you are only working with the aggs layer, make sure to set the size parameter to 0.

  2. Consider overriding the default sorting: The documents of the hits layer are sorted by the length of the bounding box diagonal of the geometries to make sure the biggest shapes are included in the final tile. This is done using a painless script so it not the most efficient way for sorting. For example if your documents contain the area of the geometries, you might better use that for sorting as it should be more efficient.

  3. Prefer WKT over Geojson: This is an area we want to improve but at the moment is better to encode your geometries using WKT instead of Geojson. The reason is that in order to read the geometries from source, we build an object model of the document in memory (see the method in the hot threads SourceLookup#sourceAsMapAndType). This means for GeoJson we will be creating many small objects which is not GC friendly. On the other hand WKT will only create one (probably big) string object which is (in most cases) more GC friendly.

Maybe tips two and three can help your use case.

In addition, during my tests I noticed that splitting your data in more shards might help latency as the fetch phase is paralelized when fetching data from different shards. That is something you might want to try, maybe using two shards, mostly three shards to see if it helps your workload.

1 Like

By the way, if you are happy with the way the API is sorting your data, you can speed it up by indexing the value as a runtime field. Look into the rally track on how to do it:

Hi @Ignacio_Vera,
Thanks a lot for your tips.

Concerning 1. Yes we use only the hits layers and our configuration is as follow:

    exact_bounds: true,
    extent: 4096,
    grid_precision: 0, // don't need the aggs layer
    size: 10000,
    track_total_hits: false,

Concerning 2., as you suggest, we will sort by our field "area" indexed as a type integer.

For 3., that good to know, I didn't know about it. It will take us a little more time to change it but I will let you know soon.
Same for the sharding, we will do some tests by increasing to 3 shards.

I will try to let you know as soon we manage to do it.

Thanks a lot for your help,

Thibault

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.