I have a single node elastic search deployment with 15K documents. The machine has 4 cores and 8 Gb of RAM. The node is handling 1300 request per second with 25% cpu utilization, 75% memory utilization. In the current deployment query response time is 100 ms.
We need the search query to run in < 30 ms.
The search query is essentially geo location search that tries to fetch document that are within x miles of the input lat/lon with some additional filters and the documents are sorted on distance (nearest to furthest). Each document has multiple lat/lon. It seems geo_distance uses only first lat/lon in the array of lat/long so the simple geo_distance filter was not usable in the query.
Have you tried denormalising and store a copy of each document for each location, possibly with the array of locations in a separate field if your application needs them? I suspect this would give better performance than using a groovy script, which you mention in your other post.
But denormalizing would add more documents to the index, will that impact query performance. In some cases I would have 50 lat/lon in a document will this adversely impact query performance.
Even if all the 15k documents has 50 geopoints in them, 750000 documents in an index is still not much (unless the documents are huge). Given your amount of memory I would expect it all to be cached anyway.
In our search query we need to use geo_range query and to and from values are a part of the document. In the geo_range query is it possible to access to and from values from within the document. We can access these values using a script.
As suggested by you removing scripts and denormalising the data I can see reduction in response time. But the response time fluctuates and I can co-relate the increase in response time with increase in merge rate on marvel. When ever the merge rate goes up the response time increases.
I tried to control the increase in merge rate by increasing the index refresh interval duration. I increased it to 60s. But I still see spikes in merge rate.
Merging should happen only when the index is updated. Is this understanding correct?
If you are not continuously indexing, e.g. if you perform periodic bulk uploads, you can force Elasticsearch to consolidate segments through the force merge API. This can be resource intensive in terms of CPU and disk I/O, but once you have consolidated into one segment, there should be no more merging until you index/update/delete more data.
You also have the option to tune how aggressive you want merging to by throttling merges.
The API curl -XPOST 'http://localhost:9200/search/_forcemerge'
{"error":"InvalidTypeNameException[mapping type name [forcemerge] can't start with '']","status":400}
Is this API supported in new versions of elastic search?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.