Gauge Visualization - Exclude docs where field is empty


#1

Updated Question

We've been able to identify the correct Painless script for our scripted fields and the correct metric type (average) to display the data the way we want to see it. The problem we're still having is that since our docs don't always have the field we're calculating on in the scripted fields, the gauge visualization (we settled on that vs. the pie chart) is still counting those docs in its average calculation. So even though 100% of the calls to a certain endpoint may be getting 404s, the gauge is still only showing 33% because only a third of the docs in this index actually have information about that endpoint.

I've modified our scripted fields so that they return null instead of 0 when the "total_count" field mentioned below is empty, but how do I then exclude it from the gauge? We have lots of fields that we could use to filter, but I haven't found any meaningful information on the JSON Input field in the gauge chart to show me how to leverage that. Any ideas?

Original Post

I'm somewhat new to the ELK stack, so let me preface this by saying that if someone can point me to the right section in the docs to find my own answer, I would value that just as well as someone saying "Do it this way". :slight_smile:

We are using Elasticsearch to index performance messages from our services. We had to switch from sending a message for each call to sending a single message every minute with rollup metrics from the services due to traffic issues. We're trying to update our visualizations to match.

The problem that we're running into is that for some of these visualizations (namely pie charts), we're struggling to wrap our brains around the necessary steps to express the ratio of two separate values in a single pie chart, as opposed to just the counts for each value over a set of documents. To be specific, we now have the number of 3xx responses and 4xx responses as separate fields in each document, whereas we used to just count the number of documents with a 3xx response code vs those with a 4xx response code.

I'm certain that there's a way we can use Painless or even some simple JSON to achieve this, but I'm struggling to figure out how. Can anybody point me to the information I need to figure out the answer?


(Brandon Kobel) #2

Hey @Speedman, are you all inserting a single document every one minute with a summary of all the response codes, or are you inserting multiple documents every minute? It'd be helpful if you could provide an example of the document/documents that you're indexing every one minute.


#3

Hi @brandon_kobel. We are inserting a single document every minute with a summary of the response codes, like so (truncated for clarity):

{
  "_index": "...",
  ...
  "_source": {
      ...
    "type": "PERFORMANCE",
    ...
    "messageDetail": {
        "http-inbound_endpoint_3xx_count": "14",
        "http-inbound_endpoint_4xx_count": "3",
        "http-inbound_endpoint_total_count": "17"
    },
    "routingKey": "our-routing-key",
    "timestamp": "1234567890"
  }
}

So far, I've tried to add scripted fields, but I've yet to get them to work, though it may be a misunderstanding of what to produce with them vs. what type(s) of configuration is appropriate for the visualization. Here's the formula we're using at the moment:

if(!doc['messageDetail.http-inbound_endpoint_total_count'].empty) {
  if (!doc['messageDetail.http-inbound_endpoint_4xx_count'].empty) {
    return (doc['messageDetail.http-inbound_endpoint_4xx_count'].value / doc['messageDetail.http-inbound_endpoint_total_count'].value) * 100
  } else {
   return 0
  }
} else {
  return 0
}

We've tried a few different variations on that, such as specifying format as number vs. percentage, not multiplying by 100, adding the 4xx and 3xx fields together for the denominator, etc. We've also tried a few different metric types on the charts, such as Sum, Count, and Average, and attempted to use JSON to provide the data for the metric. The problem is, if we're able to set our time scope to the exact second the message comes in, we can get a seemingly-accurate value on the gauge (like 50% or 100%). But if we expand the scope, either the value disappears completely (it should at least be non-zero), or it goes to something ridiculous (like 900%).


#4

Updated the original post and title to reflect where we are at in terms of figuring this out.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.