Is Geo query scalable in items of query time?
I indexed 38 thousand geo point, and run k-nearest neighbor on it. I found the "took" value from each response becomes slower from the cluster with one data node to the cluster with four data nodes.
38k geo points are not too much, so that the distribution of a query might take more time than the execution. Also, by default, every query is send as round robin around the cluster, even if the data is stored around locally to prevent query hotspots.
You also havent mentioned anything about your setup, number of nodes/shards/queries etc, so it is really hard to make a concrete statement here.
I have these 38K geopoints (~43G) with 4 shards, no replica. The cluster consists of three masters, four data node, and one client node with Xeon 4cores 2.2G, 16G RAM on Virtual Machines. Also, there is a single node with no dedicated role and the same hardware.
The query is filtered by postal code as keyword, then match street as string. I also executed point in polygon, and K-nearest neighbour, and aggregation by geohash. I sent these query via elasticsearch.js synchronously and calculated the average value of "took" fields in the query response.
Do you mean if I index more data, the single node will decrease significantly? Also, what kind of metrics I should track in order to see if elasticsearch is scalable in our case?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.