I'm trying to collect the number of queries that users send to Elasticsearch to understand how many queries per day/month are submitted to our clusters.
I've tried using the following metrics mentioned in Nodes stats API | Elasticsearch Guide [8.6] | Elastic indices.search.fetch_total indices.search.query_total
However, I'm not sure what the two above metrics mean exactly.
I'm using Elastic integration of Datadog to collect those metrics (submitted as elasticsearch.search.fetch.total.count and elasticsearch.search.query.total.count). My idea was to aggregate the count using cumulative sum during a given period of time (1 day or 1 month) and take the last value to represent how many queries were submitted in that period.
I ran a few small experiments with a few queries on a test cluster and here is what I understood:
elasticsearch.search.query.total.count seems to count the number of docs returned by queries, which is not what we want, but the metric name is confusing.
elasticsearch.search.fetch.total.count seems to count the number of queries, which is probably what we want, but there is a problem:
An Elasticsearch cluster with not a lot of active queries shows really large numbers for elasticsearch.search.fetch.total.count (up to 700,000,000 per month) which is surprising and probably wrong.
Hello @Milad_Heydariaan , welcome to the community !
I believe the stats you are referring to are correct and _node/stats/indices/query_total should, by definition, provide total query operations. Have you tried the command in DevConsole and compare the result with elasticsearch.search.query.total.count ?
I'd suggest you to setup a scripted query which directly works with _node/stats rather than relying on Datadog or any other third party API since ES provides what you need here.
Checked _nodes/stats/indices again and I see: .nodes.*.indices.search.query_total shows 339 and 6854 (Total: 7193) .nodes.*.indices.search.fetch_total shows 7 and 763 (Total: 770)
This is surprising to me since query_total was increased by 3 and fetch_total was increased by 2. I was expecting to see the query_total increasing just by 1 since I only ran 1 query.
My goal is to see how many queries are flowing to our cluster and report aggregated statistics daily/monthly.
If query_total or fetch_total are not suitable for what I need, is there another metric that I can use?
Another question: are those metrics increasing for the lifetime of the cluster or they reset to 0 at some point?