Definitive guide for tuning using Marvel?

paulkeogh · June 9, 2016, 3:01pm

Is there a definitive guide available for tuning elasticsearch using Marvel ?

I have a 6 node cluster with 5 billion docs running over 2 VMs on VMWares vCloud environment.

A simple aggregation is taking ~40 seconds - how can I use Marvel to investigate this latency in a formal and methodical fashion ?

Thanks,

pickypg · June 16, 2016, 7:25pm

There is no guide -- at least not yet -- but options vary based on the version of Elasticsearch that you're running.

For example, ES 2.3 (not Marvel) supports a new query profiler. This could be immensely useful, but it does not yet profile aggregation performance. Even so, it will help to see the performance of the raw query that you're performing, which is a prerequisite to the overall aggregation.

In terms of using Marvel to help to tune ES, it's more of an operational tool for tuning overall performance, rather than individual requests. Having said that, you can use a fresh instance that only runs the individual request as a way to tune based on that request.

Things that you need to be aware of for tuning any request really has nothing to do with Marvel, but everything to do with some general guidelines:

The size parameter of every level of the request has a direct impact on the amount of data that needs to be passed around (both externally, which is the obvious part, but also internally within the cluster).
- This is a big problem that I see users frequently run into by requesting scary sizes.
The amount of work done by the query is a baseline requirement for getting the overall request to run.
- If you are not using filters when possible, then you should rework the request to use them because filters can be cached for repeated requests.
Any sorting that you may be doing.
- It's unclear which version you're using, but it's possible that you're using fielddata rather than doc values, which may be causing a lot of performance issues (it's more likely that you're using fielddata in ES 1.x).
The amount of work done by the aggs (aka aggregations) at each step.
- It's unclear which version you're using, but it's possible that you're using fielddata rather than doc values, which may be causing a lot of performance issues (it's more likely that you're using fielddata in ES 1.x).

Which of those would Marvel help with? Really only the last two because of the repeated sub-bullet. But it also shows search latency, which should highlight when this is becoming a problem.

Another point that I noticed from your post was that you mentioned 6 nodes were running on 2 VMs. Why are there 3 nodes per VM? You could look at Marvel to see what each node is doing in terms of its performance metrics while this type of slow request is being handled: does their JVM Heap utilization go up significantly? Does the CPU go crazy?

Hope that helps,
Chris

Topic		Replies	Views
How to tuning aggregation performance Elasticsearch	2	1998	July 5, 2017
Performance tuning steps of Elasticsearch cluster Elasticsearch	7	819	July 5, 2017
Massive performance issues on our production cluster Elasticsearch	5	2595	July 6, 2017
Config/Tuning advice :) Elasticsearch	5	734	August 25, 2017
How to improve the Elasticsearch performance Elasticsearch	2	669	July 5, 2017

Definitive guide for tuning using Marvel?

Related topics