Vert High Latencies For GeoDistance query for 10k documents

Parth_Narang · October 2, 2022, 11:39am

I have a requirement to fetch 10k documents around a lat long in 10km radius while below query works fine but take around 350 ms to fetch all the documents. is there any other way to improve on this was expecting latencies to be less than 200 ms

Query :

{
  "from": 0,
  "size": 10000,
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "must": [
              {
                "geo_distance": {
                  "distance": 10000,
                  "latLong": [
                    77.6321434,
                    12.9354922
                  ],
                  "ignore_unmapped": false,
                  "distance_type": "arc",
                  "validation_method": "STRICT"
                }
              }
            ]
          }
        }
      ]
    }
  },
  "track_total_hits": false,
  "track_scores": false,
     "_source": {
    "includes": "restId"
  },
  "sort": [
    {
      "_geo_distance": {
        "latLong": [
          77.6321434,
          12.9354922
        ],
        "order": "asc",
        "unit": "m",
        "mode": "min"
      }
    }
  ]
}

My mappings

    "mappings" : {
      "dynamic" : "strict",
      "properties" : {
        "id" : {
          "type" : "keyword"
        },
        "latLong" : {
          "type" : "geo_point"
        },
        "restId" : {
          "type" : "long"
        },
        "version" : {
          "type" : "long"
        }
      }
    }

Christian_Dahlqvist · October 2, 2022, 12:00pm

What is the hardware secification of the cluster? What is the size of the index/indices you are querying? How many shards are in these indices?

Which version of Elasticsearch are you using?

Parth_Narang · October 2, 2022, 12:29pm

version : 7.9.1
size of index : 50MB
Number of shards : 2
cluster configuration
Number of master nodes : 3
master node ec2 instance : m5.large
Number of Data nodes : 2
data node ec2 instance : c5.4xlarge

Christian_Dahlqvist · October 2, 2022, 1:33pm

What type of storage are you using? How much storage do each node have? How much data does each node hold?

Parth_Narang · October 2, 2022, 3:19pm

Storage Type : EBS
Total Space : 198 gb
Available Space : 138gb

Christian_Dahlqvist · October 2, 2022, 3:24pm

What type of EBS storage? How much disk space does each data node have? Do you have anything apart from Elasticsearch running on the data nodes?

stephenb · October 2, 2022, 3:40pm

@Parth_Narang Welcome to the community!

I suspect the vast majority of that 350 milliseconds is the actual marshalling and construction of the response json document.

And is that 350 milliseconds measured from where?

You can run it in the profiler and see how long it's actually taking.

But 10,000 documents, even though they're small is going to take time to marshall and transmit.

Not that this matters really but that's a 100km not 10 km

"distance": 100000

Also if it the actual query you could try

"distance_type" : "plane"

To speed it up

Curious what the took says

And also curious How you arrived at 200 milliseconds should be good for the 10,000 documents. Did you get that performance before?

Do you need all 10,000 documents in a single response or could you page through them?

Also, 7.9.1 is very old. There's been a number of performance improvements we're all the way up at 8.4 with. There has been significant underlying changes to the lucene engine. I would also try upgrading if you can.

And as I'm sure Christian was getting to EBS is not the best storage for high IO Not that it won't work, but it's certainly not the lowest latency option.. on the other hand, if you keep hitting those indices and there's not much else going on, most of those indices could be in RAM... But if it's a busy cluster that won't be the case.

Hard to say...

Parth_Narang · October 3, 2022, 8:24am

EBS Type : Provisioned IOPS (SSD)
Total Disk Space : 200 Gb out of which 140 is available
No only Elasticsearch is running on data nodes

Parth_Narang · October 5, 2022, 1:57pm

Hey @stephenb Thanks
we tried hitting the cluster with our microservice with sufficient throughput and tried few requests with kibana also . i corrected the distance was a typo mistake
no improvement by changing
"distance_type" : "plane"
value in took is always more than 300 ms
Not really we just thought that 200 ms would be enough for this operation based on the data we had on other operations though size was less for other operations so it seems like majority of time spent is in marshall and transmit.
update is not an option for now but thanks for the recommendation
This is the new cluster we created to support this operation only so no other indices would be in RAM and still latencies are high that means storage is most probably not an issue
Btw we tried our experiment for 1 hour will the latencies improve if we increase the time period ?

stephenb · October 5, 2022, 2:16pm

Not much more guidance... Keep tuning and / or try a newer version you are way way behind.

You could look at the index codec / compression... See if best speed helps...

Still think 300ms is not unreasonable to fetch 10K records ... Actually seems pretty fast to me....

BenB196 · October 5, 2022, 5:32pm

One option to try and help reduce some latency, if you only need parts of your documents returned (rather than the whole document), you can look into filter_path, and only return what is needed. This would in theory reduce the return payload to the microservice querying Elasticsearch

system · November 2, 2022, 5:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.