I should offer some more context here, because I suspect it might be significant, and I think it's me that's missing something: I'm looking at TSVB gauge visualizations developed by a colleague. I want to omit the gauges displayed with the values 0 and 14400. Currently, I'm applying a filter on the Data > Options tab rather than the Panel filter under Panel options.
This by itself:
works as expected and desired: the gauges with the value 14400 are no longer displayed. However, some of the gauges show the value 0 (zero). I can find no documents that contain the CPCHRMSU field with the value 0. I can find documents where the CPCHRMSU field does not exist.
This by itself:
works as expected and desired: gauges with the value 0 are no longer displayed.
However, the combination that I originally cited:
CPCHRMSU:* and CPCHRMSU<14400
omits gauges with the value 14400, but displays gauges with the value 0. CPCHRMSU>0 and CPCHRMSU<14400—that is, with >0 instead of :0—displays identical results: the unwanted 0-value gauges remain.
Some more context: the viz uses the Top Hit aggregation (Size: 1, Aggregate with: Max). I've read the doc topic for top_hits, including the tip "We do not recommend using top_hits as a top-level aggregation". I didn't develop this viz: I'm posting to this topic primarily to seek advice and educate myself so that I can talk to the developer about it. I wonder whether the Top Hit aggregation is somehow responsible what I've interpreted here so far as a problem with the filter. But my thinking is not clear enough on this point to understand what's going on.
I understand that I probably need to start performing some searches directly myself; say, using curl, so I can better understand what's really going on here in terms of the search and the results returned.
Is it possible that the display value of your field is not the same as the stored value? For example, if the display value is a duration, then there is a difference between the stored value in nanoseconds vs the display value in milliseconds.
Trying to understand this (I'm still familiarizing myself with both the underlying data and the already-developed viz) is starting to do my head in.
I've just noticed that the "Data timerange mode" is set to "Last value".
More disclosure: the metric for the gauges specifies a Group by Terms, By a (text) field (let's call it xyzName). For some xyzName field values, there are documents with CPCHRMSU field values in other time intervals in the time range, but there is no document for that value in the last time interval.
I could still be wrong about this, but such term values seem to be the ones that are displayed with bogus 0-value gauges. I have no idea what to do about this.
You probably want to change the date timerange mode. I recently added a recommendation to the tsvb docs that identifies this as a common source of confusion. If you see data in the Time Series chart but not the Gauge, then that's the cause.
Interestingly, by contrast, the legend of the TSVB Time Series viz shows no value (a blank) for ss27, ss25, and ss23, not zero.
To recap, and also to add further detail:
Aggregration: Max (was: Top Hit, but I wanted to rule out possible issues with that agg)
Data timerange mode: Last value
Drop last bucket? Yes
The time range ends at 2021-01-16T09:55:00.000Z
Based on the "last value" timerange mode, an interval of 100s, and dropping the last bucket, the time value "in play" here is 2021-01-16T09:53:20.000Z (end of time range, minus 100s: that is, 1 minute 40 seconds).
I've done some more research using Discover in Kibana, outside of the TSVB viz editor.
Using the following filters:
xyzName is one of: ss27, ss25, ss23, ss22, ss21
utcTimestamp (the time filter field for this index pattern) is: 2021-01-16T09:53:20.000Z
I'll admit, I'm in way over my head. Having said that, based on what I've seen in the HTTP API responses for the TSVB (in my browser's Developer Tools), there's a bug here.
Without the data filter < 14400, the data response for label ss27 includes the following penultimate item (corresponding, I think, to the time value 2021-01-16T09:53:20.000Z):
With the data filter < 14400:
I should add here that, with the data filter < 14400, the response contains label values only for those 5 values of xyzName that I've cited. Without the data filter < 14400, I get a heap more.
Based on a closer inspection of the data response, I think I can see what's going on here.
For those additional xyzName values that appear in the response without the data filter < 14400, every item in the response data array has the (CPCHRMSU) value 14400. With the data filter < 14400 applied, those additional xyzName values do not appear in the response at all (no matching "label" value for them).
However, ss23, ss25, and ss27 have various values of CPCHRMSU. With the data filter < 14400, results for those xyzName label values are returned; furthermore, for each data item where the CPCHRMSU value is 14400, rather than returning the value 14400, the response contains the value null (as shown above).
I'm not saying that the null value is itself a bug. The bug is what the TSVB visualization does with that value. In particular, the way that the TSVB gauge viz displays that null value as a 0-value gauge. The correct behavior is for there to be no gauge for that label.
To summarize: I think I've identified a bug. Am I correct?
There's a lot to respond to here, and I may have missed something in all the text. It is totally possible that you are running into a bug, but I am not convinced that you are seeing a bug based on what you've shared.
Elasticsearch in general does some unintuitive things with dates, and TSVB is also a bit unintuitive, so I will try to explain what I think could be happening.
Filters are applied to documents in Elasticsearch, not aggregated values. You said you have "documents with various values" and it sounds like your query is matching those values.
The Terms aggregation is the top-level query that TSVB runs, so it matches any value in your time range, including values before and after the current window. This is why you might see null values in the specific range you're looking at.
Your time interval is set to 100 seconds, which is rounded downper the ES docs. This means that your attempt to create an equivalent query in Discover might not have been exactly correct.
You've chosen to "drop last bucket", which means that the ending time range is not going to be included in the query. This is most useful when dealing with ranges like "last 7 days" where you don't have complete data for the current data, but you probably don't need this.
In conclusion, I think your configuration is the problem, not TSVB. I think that you should:
a. Change the data timerange mode to "entire timerange"
b. Stop dropping the last bucket
I debated whether or not to post that report of my observations before a test case, but, at the risk of too much info, I thought it was worthwhile documenting the evolution of this topic from question to bug description; and then, possibly, to smackdown by someone on the Elastic dev team .
While I don't understand why the original developer of this viz chose to drop the last bucket (I'll ask), I do understand why they chose "Last value". The value of the field being visualized changes over time based on a rolling 4-hour period. For a given time range, the most recent value of that field is of particular interest. An aggregation over the entire time range might be of some interest, but, in this case, users are most likely to be interested in the last value.
Apply the TSVB settings to the split documents above, which shows bucket 3. But even though bucket 3 only matches values from Charlie, Bravo is still a top-level key that exists in your dataset so we show it.
The reason step 4 is not a bug is that we have found that most users want to see all of the named entities in their data, and I believe that we have the right default already. We do sometimes add extra settings to TSVB, and maybe what you are looking for would fall into that category.
Thanks for your detailed explanation. Sincerely appreciated.
However, "This is how it works" is never a good reason for doing something wrong.
Displaying a numeric value without underlying numeric data for that value is wrong.
That is what TSVB is doing here. It's displaying the numeric value 0 without underlying numeric data to support that value.
Furthermore, because of the order of processing, it's not doing that consistently. Here, I'm referring to consistency from the user's perspective—what the user sees: the displayed visualizations—as opposed to the consistency of the underlying processing.
With reference to the example data I provided:
Alpha does not appear in the results of the initial query (that you cited) because all of its field1 values are 14400
Bravo appears because some of its field1 values are not 14400
I deliberately specified the data for Alpha and Bravo with this difference, to highlight the inconsistency in the visualizations displayed by TSVB.
Alpha and Bravo have the same field1 value, 14400, for the timestamp in question. However, TSVB displays a gauge for Bravo, but not for Alpha. This is the inconsistency to which I am referring.
Displaying a gauge for Bravo, with the bogus value 0, is a bug
Not displaying a gauge for Alpha is correct behavior
You might argue: "No, there is no inconsistency here. The underlying processing is consistent—the same steps are performed in both cases—as I have already explained to you". You would be missing my point: from the user's perspective, for the timestamp in question, both Alpha and Bravo have the same field1 value, 14400. However, TSVB treats them differently. Why? Because all of Alpha's field1 values are 14400, whereas only some of Bravo's field1 values are 14400. That's a poor reason; especially in the context of the "last value" option, where the user is interested only in the value for a particular time interval.
Do you still maintain that there is no bug here, even after I have pointed out to you the difference in behavior—the visualizations displayed—for Alpha and Bravo?
In that case: do you consider the omission of a gauge for the "named entity" Alpha to be a bug? Do you think that it, too, should appear with a (bogus!) 0-value gauge, as TSVB displays for Bravo?
What I am looking for is a solution that displays a numeric value only where there is underlying numeric data to support that value.
(How) do you expect users to distinguish between:
A gauge that displays the value 0 without underlying numeric data (that is, based on a null field value in the API response)
A gauge that displays the value 0 based on a calculation (aggregation) of actual numeric field values
Or do you think such a distinction is insignificant?
Your responses so far confirm the validity of the statement that I made earlier:
I invite your rebuttal of that statement. I don't want it to be true.
I ran this example data in TSVB and I think you're right that there are some unexpected or buggy behaviors with the "last value" mode. It is a bug to show zero instead of null. The idea of showing all series names—even ones with null values—seems to be working as intended but I think we could open an issue to track requests for this.
To clarify the second point, TSVB is entirely based on the first view of the "Time Series". Compare the time series to the other types and you'll see that we are showing the same series, but different values. So that's why I say the zero values are a bug, but the series display is not a bug.
Since TSVB does have this bug, your only workaround is to use the Gauge type that's not part of TSVB. You can take advantage of the per-panel time range in your dashboard, which will let you compare multiple time periods side by side.
The per-panel time range feature is also why I don't ever recommend the "last value" mode of TSVB. It causes problems for our users.
Thank you for that acknowledgment, and for taking the time to discuss this topic.
Yes, I agree that would be a good idea. I am happy to open that issue myself.
Unless you object (I will wait until Friday 10 AM UTC+8 for your reply), I will open an issue in the elastic/kibana repository on GitHub. In that issue, I plan to copy selected excerpts from this forum topic, and also link to this topic.
I am hoping that opening an issue will do more than "track requests"; I am hoping it will prompt code changes by the Elastic dev team that correct this behavior; or at least, offer an option to correct this behavior.
Yes. Earlier in this topic, I reported the following related observation:
Just for completeness, I thought it was worth mentioning that, understandably (given the observations in this thread), all of the TSVB visualizations, including Time Series, exhibit the same inconsistency that I reported for the gauges: with the data filter field1 < 14400, they show Bravo, but not Alpha. A consistent inconsistency .
I accept that the series display is your (the Elastic dev team's) choice: the result of a deliberate decision.
Yes, I've started looking at that. I'm having trouble reproducing some of the features (such as Numeral.js '00:00:00' formatting) that are available in TSVB (I need to refamiliarize myself with the JSON input field). But I'll leave that to other topics, and perhaps to the original developer.
Thanks again for your time on this topic. I was initially mystified by the behavior that I was observing. You've been a big help in clarifying what's going on here, and I sincerely appreciate your acknowledgment that there is, indeed, a bug.
Before moving on from this topic, I want to revisit the use case that prompted it; in particular, regarding this:
I deliberately deferred responding in detail to this until now, because I did not want the specific—perhaps, idiosyncratic—details of the use case to deflect attention from the identified bug.
Returning to the original use case...
The description of the field CPCHRMSU is "Time until capping"; or, depending on which doc you read, "Remaining time until capping".
What's "capping"? Here, it's sufficient to know that capping is system behavior that is typically undesirable. It's useful to know when systems are at risk of capping.
However, the description "[Remaining] time until capping" is simplistic. That description is accurate for most, but not all, values of CPCHRMSU.
Capping is based on a rolling 4-hour period. The maximum value of CPCHRMSU, 14400 (in seconds; meaning, 4 hours), does not mean "capping will start in 4 hours". Rather, it means "there is currently no risk of capping". It might also mean, although I'm less certain of this, "an actual value cannot be determined".
The value 0 means "already capped"; although—again, I'm less certain of this—under some specific circumstances, depending on the value of other fields, it might also have another meaning. As you can tell, I'm not a subject-matter expert here.
The intent of the original visualization is to show systems that are at risk of capping.
The original visualization had no data filter. It showed gauges with the label/value combination "Time until capping 4:00:00" (I'd used the Numeral.js syntax '00:00:00' to format the displayed value).
As already discussed, that label/value combination is bogus, misleading. Furthermore, in the context of this "at risk" viz, those gauges are noise, because, with that value, those systems are not at risk of capping. Systems with that value should not appear in the viz.
That is why I applied the data filter CPCHRMSU < 14400, triggering the bug described earlier in this topic. At least for the data I was visualizing (which was much richer than the "3-item" example data I posted to this topic), most systems were not at risk of capping. The "at risk" gauges were swamped by all the "Time until capping 4:00:00" noise.
Minor point: I acknowledge that I can use color to distinguish betweeen "at risk" and "not at risk" systems. For example, I can display gauges with the value 14400 (4 hours) in green, and other values in yellow, or red. Still, the user must ignore a sea of green gauges.
I accept that you have deliberately decided to do this.
However, I want to point out that, for this use case, that decision has two undesirable effects:
Noise: in a visualization intended to show only "at risk" systems, that decision means also showing systems that are not at risk
A false statement: "Time until capping 4:00:00"
One might argue, "Okay, so there's noise. But those gauges are green, and informed users—subject-matter experts who understand the details of the underlying calculation—will understand that 'Time until capping 4:00:00' doesn't really mean that."
I make the following general statement, which I think has broad applicability beyond this topic: ironic humor aside, making a statement that is literally false—with the excuse that readers or listeners will understand that statement to be literally false, and instead understand it to mean something else—is perilous.
I'd appreciate your thoughts and any new suggestions on this.
Meanwhile, following your existing suggestion, I've started using non-TSVB gauges, with a "Top Hit" aggregation (which, counterintuitively, and please correct me if I'm wrong, appears to correspond to the TSVB "Last value" option) with a CPCHRMSU < 14400 filter "baked-in" to the visualization. That filter omits the noise.
(I've contacted the original viz developer: they don't need to drop the last bucket.)
One unfortunate consequence of "omitting the noise", though—and here, you might accuse me of wanting to have my cake and eat it, too: in TSVB, the extent of (angle covered by) the gauge arc was relative to 14400, whereas, in the non-TSVB gauges, those arcs are relative to the largest value in the displayed results: I'd like to make them relative to the maximum value of 14400. I wish I knew how to do that without re-introducing the "noise".