Exporting Elastic cloud cluster performance stats

Hi @kielni Thanks for the feedback I will pass that on, I agree.
(Our self-manged offerings which I am not suggesting you change to have multiple ways to export the Elasticsearch metrics to the monitoring system of your choice, with respect to Cloud offering I think we are still working on it, strangely it takes a little more effort.)

Now on to your other questions
1st) I would not try use the the metrics charts under Deployments / [deployment] / Performance, tl;dr they not as reliable as the the metrics in the stack monitoring. (holdover from the past and were meant originally as a quick glance etc. apologies yes confusing. )

So I will only comparing to the charts within Stack Monitoring specifically to the totals and nodes etc... Stack Monitoring is the only capability I would use to inspect performance at this time.

2nd) With respect to your aggregations / queries and math, I think you are close but are using some incorrect fields / understanding.

"node_stats.indices.indexing.index_time_in_millis"

Is not an elaspsed time, from here

"index_time_in_millis: (integer) Total time in milliseconds spent performing indexing operations. "

index_time_in_millis it is the time actually spent indexing the documents, it is used in calculated avg time for indexing operations it is not the elapsed time, that goes for query etc... so this is not the metric you should divide by to get the indexing operations / sec (per time value).

so you should be using the elapsed time for index / sec or query / sec it would be

Here is mine... and these line up with what I see in stack monitoring and make sense.

GET .monitoring-es-7-mb-2021.03.02/_search
{
  "aggs": {
    "node": {
      "terms": {
        "field": "source_node.name",
        "size": 5
      },
      "aggs": {
        "from_ts": {
          "min": {
            "field": "timestamp"
          }
        },
        "to_ts": {
          "max": {
            "field": "timestamp"
          }
        },
        "from_index_count": {
          "min": {
            "field": "node_stats.indices.indexing.index_total"
          }
        },
        "to_index_count": {
          "max": {
            "field": "node_stats.indices.indexing.index_total"
          }
        },
        "from_index_time_ms": {
          "min": {
            "field": "node_stats.indices.indexing.index_time_in_millis"
          }
        },
        "to_index_time_ms": {
          "max": {
            "field": "node_stats.indices.indexing.index_time_in_millis"
          }
        },
        "from_search_count": {
          "min": {
            "field": "node_stats.indices.search.query_total"
          }
        },
        "to_search_count": {
          "max": {
            "field": "node_stats.indices.search.query_total"
          }
        },
        "from_search_time": {
          "min": {
            "field": "node_stats.indices.search.query_time_in_millis"
          }
        },
        "to_search_time": {
          "max": {
            "field": "node_stats.indices.search.query_time_in_millis"
          }
        },
       "sum_index_time": {
          "sum": {
            "field": "node_stats.indices.indexing.index_time_in_millis"
          }
        },
        "sum_query_time": {
          "sum": {
            "field": "node_stats.indices.search.query_time_in_millis"
          }
        },
        "heap_used_percent": {
          "avg": {
            "field": "node_stats.jvm.mem.heap_used_percent"
          }
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "node_stats"
          }
        },
        {
          "term": {
            "cluster_uuid": "asasdfasdfsadfasasfdasdf"
          }
        },
        {
          "term": {
            "source_node.name": {
              "value": "instance-0000000073"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "now-5m"
            }
          }
        }
      ]
    }
  }
}

# Results

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 30,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "node" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "instance-0000000073",
          "doc_count" : 30,
          "sum_index_time" : {
            "value" : 1.271088426E9
          },
          "from_search_count" : {
            "value" : 1.1329617E7
          },
          "heap_used_percent" : {
            "value" : 58.7
          },
          "from_index_time_ms" : {
            "value" : 4.2366507E7
          },
          "to_ts" : {
            "value" : 1.614657039624E12,
            "value_as_string" : "2021-03-02T03:50:39.624Z"
          },
          "sum_query_time" : {
            "value" : 2.61023127E8
          },
          "to_search_time" : {
            "value" : 8702127.0
          },
          "to_search_count" : {
            "value" : 1.133298E7
          },
          "to_index_count" : {
            "value" : 5.28768085E8
          },
          "from_ts" : {
            "value" : 1.614656749623E12,
            "value_as_string" : "2021-03-02T03:45:49.623Z"
          },
          "to_index_time_ms" : {
            "value" : 4.2372696E7
          },
          "from_index_count" : {
            "value" : 5.28713453E8
          },
          "from_search_time" : {
            "value" : 8699523.0
          }
        }
      ]
    }
  }
}

The elapsed time need to be a bit careful but I took the difference in timestamps technically these are 10s collection buckets.

Name Value
From Index Count 528,713,453
To Index Count 528,768,085
Delta Index (Number of Indexing Events 54,632
Elapased Time sec (difference in time stamps) 290
Indexing Events / Sec (correct) 188
From Index Count 528,713,453
To Index Count 528,768,085
Delta Index (Number of Indexing Events 54,632
From indexing time 42,366,507
To indexing time 42,372,696
Delta Indexing TIme ms 6,189
Avg Indexing Time ms / request (correct) 0.1133

The index_total, query_total are monotonically increasing counters like many other metrics in systems like I/O bytes_in, bytes_out etc and the typically they are graphed as a rate, these do have actuall time buckets associated with them of 10s, but the timestamps can generally be used as a good proxy. (Not sure that helps or not)

Hope that helps a bit.