Need help to improve performance with ES


(Deep-2) #1

Hi,

I have a single node elastic search deployment with 15K documents. The machine has 4 cores and 8 Gb of RAM. The node is handling 1300 request per second with 25% cpu utilization, 75% memory utilization. In the current deployment query response time is 100 ms.

We need the search query to run in < 30 ms.

The search query is essentially geo location search that tries to fetch document that are within x miles of the input lat/lon with some additional filters and the documents are sorted on distance (nearest to furthest). Each document has multiple lat/lon. It seems geo_distance uses only first lat/lon in the array of lat/long so the simple geo_distance filter was not usable in the query.

Need help to optimize the query.

Regards,
Deep


(Christian Dahlqvist) #2

Have you tried denormalising and store a copy of each document for each location, possibly with the array of locations in a separate field if your application needs them? I suspect this would give better performance than using a groovy script, which you mention in your other post.


(Deep-2) #3

Hi Christian,
Thank you for your response.

As suggested let me denormalise and try.

But denormalizing would add more documents to the index, will that impact query performance. In some cases I would have 50 lat/lon in a document will this adversely impact query performance.

Regards,
Deep


(Christian Dahlqvist) #4

Even if all the 15k documents has 50 geopoints in them, 750000 documents in an index is still not much (unless the documents are huge). Given your amount of memory I would expect it all to be cached anyway.


(Deep-2) #5

Ok.
On marvel i can see the data size as 100 Mb. Let me denormalise and share the results.

Thanks.


(Deep-2) #6

Hi Christian,

In our search query we need to use geo_range query and to and from values are a part of the document. In the geo_range query is it possible to access to and from values from within the document. We can access these values using a script.

Regards,
Deep


(Deep-2) #7

Hi Christian,

As suggested by you removing scripts and denormalising the data I can see reduction in response time. But the response time fluctuates and I can co-relate the increase in response time with increase in merge rate on marvel. When ever the merge rate goes up the response time increases.

I tried to control the increase in merge rate by increasing the index refresh interval duration. I increased it to 60s. But I still see spikes in merge rate.

Merging should happen only when the index is updated. Is this understanding correct?

How can I control Merge Rate?

Regards,
Deep


(Christian Dahlqvist) #8

If you are not continuously indexing, e.g. if you perform periodic bulk uploads, you can force Elasticsearch to consolidate segments through the force merge API. This can be resource intensive in terms of CPU and disk I/O, but once you have consolidated into one segment, there should be no more merging until you index/update/delete more data.

You also have the option to tune how aggressive you want merging to by throttling merges.


(Deep-2) #9

Hi Christian,

I am using elastic search 1.4.

The force merge API returns an error.

The API curl -XPOST 'http://localhost:9200/search/_forcemerge'
{"error":"InvalidTypeNameException[mapping type name [forcemerge] can't start with '']","status":400}

Is this API supported in new versions of elastic search?

Regards,
Deep


(Deep-2) #10

Hi Christian,

just figured out that in version 1.4 this api was optimize.

The api curl -XPOST "http://localhost:9200/search/_optimize?max_num_segments=1" worked.

Regards,
Deep


(Deep-2) #11

Hi Christian,

After optimize it automatically creates multiple segments. There are no deletes/updated to the index.

What could be the reason for this?

Regards,
Deep


(Christian Dahlqvist) #12

If you look at index statistics, do the number of documents or the number of deleted documents change when segments are created?


(system) #13