I don't believe there are any current endpoints in the API that support this, but are there plans to add better profiling information to ES aggregation queries? We'll see some agg queries return in 11s, then <5s then >11s again. Sometimes we can see associated filter cache expirations, but it's really hard to line these up to one specific query in our production environment since multiple users are executing queries simultaneously.
It'd be really helpful to optionally see where aggregation queries are spending the bulk of their time to help us understand what to improve in the future.
We have this in SPM. You can capture transaction traces, including distributed ones involving multiple components, servers, and network hops. Elasticsearch support is built-in. If you want to capture calls in your own apps, you can add custom pointcuts and get deeper insight. See https://sematext.atlassian.net/wiki/display/PUBSPM/Transaction+Tracing . The upcoming SPM release will also show you the whole map of your application/components/servers and how they talk to each other, so you'll see a bigger picture around your ES cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.