Group by only for the latest date in pie

Andy_Fan · July 26, 2016, 5:38am

I have the following data.
|date |cluster_name|database_name|table_name|table_size
|2016-05-12|X|DB1|table1|100
|2016-05-12|X|DB1|table2|200
|2016-05-12|X|DB1|table3|700
|2016-05-13|X|DB1|table1|100
|2016-05-13|X|DB1|table2|200
|2016-05-13|X|DB1|table3|700

given some filter like cluster_name='X' and database_name='DB1'. Ｉcan generate a beautiful report for the data size in history with line chart.

but what I want here is pie chart to show something like this:
TABLE1: 100G(10%)
TABLE2: 200G(20%)
TABLE3: 300G(70%)
this data is from the latest data.

but the issue here is if I choose time range 1 day, I can get the correct summary table.
TABLE1: 100G(10%)
TABLE2: 200G(20%)
TABLE3: 300G(70%)

but I can't get the history size trendecy.

but If I choose time range as 2 days, I can't figure out a way to show the latest status correctly.

tylersmalley · July 26, 2016, 7:59pm

Hopefully, I understand what you are wanting to accomplish. You want the visualization on the dashboard to always display the current (last 24 hours) regardless of if the timepicker covering a larger period?

I was able to accomplish this by the following:

On Discover, add the desired filter cluster_name='X' and database_name='DB1'.
Select the date facet, and add one of the items to the filter.
Edit the date filter, you will then have a JSON representation of the filter. You can then replace it with a range:

{
  "query": {
    "range": {
      "date": {
        "gt": "now-1d"
      }
    }
  }
}

You can then save the search, create a visualization based on it and add it to your dashboard.

Hopefully this is what you were looking for.

Andy_Fan · July 27, 2016, 2:29am

Thanks you. You understand me correctly.

I followed your instructions until step 4, (create a visualization based on it). I run into some new issues I didn't expect before.

I want to create a "pie", which will show the biggest table by data size. what I can do is
"Aggregation Sum(table_size)"
"split Slices" -> "table_name"
"filter by ": cluster='cluster' and table='table_name'

The issue is "we may not load the data once per day exactly", sometimes, it may more than once(like for some test?) or less than once(means not data in last 24 hours) due to some error which happens recently.

But I want the dashboard always show the latest data.
for example: If we load data twice on 2015-05-12, one is one 1:00 pm, the other one is 2:00 pm. I want to only care about the data in 2:00 pm.

and if we loss the data in the recent data, we will use the latest data in the previous day.

is it possible?

I will try my best to keep the data once per day, but it would be great that dashboard can handle such case also.

Thanks

tylersmalley · July 27, 2016, 5:30am

The only way I could see this being possible is if you were indexing the difference. Instead of indexing the size at a given time you would index the change since the previous indexing. Having this would allow you to simply SUM the size field. It would also be easier for capacity planning as you would have a cumulative graph.

Andy_Fan · July 28, 2016, 12:25am

this will make the writer a bit of harder:)

maybe the another solution is I store the latest data into a another index. in dashboard, I just show the latest information in the other index and show the history data in the original index. actually I didn't like this one also..

tylersmalley · July 28, 2016, 12:28am

You could use an alias for your stats which points to the most recent index.

When your script inserts the data, it does so into a new index. After it's done inserting you can update the alias to point to the new index.

Andy_Fan · July 28, 2016, 12:38am

this is a really cool suggestion and exactly resolved my uncomfortable about the this solution! I will try to do this and give a final update here. suppose everything should works pretty well now:)

tylersmalley · July 28, 2016, 1:08am

Great to hear, keep us updated!

Andy_Fan · August 2, 2016, 4:05am

Sorry for the late update. No more unexpected things happens and the task has been completed.

The main steps includes:

During a data collecting, I store the exact same data into 2 indexes. One is a new index, one is a old index which contains all the history data. The new index only contains the latest one time data.
change a 'xxxsummary' index alias to the new index
create pie chart to show "who is the top 5 biggest table CURRENTLY" with the xxxsummary index.
at the same time, we create history tendency with the history index,
Y Aliax: sum(table_size)
X Alias: Terms + date + ASC + size 10000(a enough big value)
Split line: table_name

Topic		Replies	Views
How to visualize data with the latest date Kibana	2	1536	July 31, 2017
Dashboard Time Frame Not Affecting Pie Chart or Table Visualizations Kibana	3	740	December 23, 2020
Lens - Pie Chart Summary Kibana visualisation	5	221	March 16, 2024
Static Pie Chart Kibana	10	23	November 29, 2024
Time based count in Kibana Visualization Kibana	3	494	October 21, 2021

Group by only for the latest date in pie

Related topics