Data is already aggregated in logstash feed, how to create a Kibana UI on this aggregated metrics

apdpsudx · March 29, 2020, 2:44pm

Hi All,
I am a bit new to ELK and have been researching this for a bit, and most forums and links I found indicate this is not possible. Happy if there is a thread open/closed for me to review as well.

Scenario:
Running Kibana 7.1.1 on RHEL7
I have a python script that is query several databases and doing counts and then logging this into a csv.
Example CSV output: (timestamp,region, datatype,count of rows)
2020/03/01 10:00,US,clients,10
2020/03/01 10:00,US,vendors,12
2020/03/01 10:00,US,warehouses,3
2020/03/01 10:00,CA, vendors,10
2020/03/01 10:00,CA,clients,10
2020/03/01 10:00,CA, warehouses,10
2020/03/01 10:05,US,clients,10
2020/03/01 10:05,US,vendors,12
2020/03/01 10:05,US,warehouses,3
2020/03/01 10:05,CA, vendors,10
2020/03/01 10:05,CA,clients,10
2020/03/01 10:05,CA, warehouses,10

what I'm trying to achieve is that for kibana users plot the Data which was already aggregated - each datatype by count (X-axis) over the time series (Y-axis)

I see my data correctly when querying via kibana UI; however when I try to create the metric on the kibana dashboard the obvious maps-aggregations, but I just want to plot the actual values.

Any suggestions on how to do this? should I change how I input?

I don't want to log all data, we are talking billions and by the end of the day, trillions of rows across all databases which is why I am aggregate during data fetch as opposed to logstash fetching and letting kibana. my hardware is 2x 768gb memory 72core 2TB ssds. So I do have compute power but don't want to waste disk.

flash1293 · March 30, 2020, 12:31pm

It's fine to store pre-aggregated data in Elasticsearch and it should be possible to use Visualizations to show these individual data points just fine - it's just a special case of aggregating where just a single value goes into each aggregation.

For your case, create a line chart with a date histogram on timestamp on the x axis (with minimum interval of 5mins) and sum of count field on the y axis - in most cases (depending on your time range) there will be just a single value per bucket, so it's just calculating the "sum" of a single value which is the value itself and effectively just plotting the individual values.

apdpsudx · March 31, 2020, 4:26am

thanks Flash!

looking at my Y-axis; when I set Sum under Aggregation I get only "geoip.latitude/longitude". poking around a bit I gather that maybe my ES has wrong mapping for my index - my log stash config is:

    input {
        beats {
           port => 5044
        }
        file {
            path => /my/raw/data.csv
            type => order-data-by-region
            exclude => "*.gz"
            start_postion => "beginning"
            sincedb_path => /var/tmp/since.db
        }
    }
    filter {
        csv {
            columns => [
                "queryDate",
                "region",
                "datatype",
                "count"
            ]
           separator => ","
           remove_field => ["message"]
    }
    output {
        elasticsearch {
            hosts => ["http://localhost:9200"]
        }
        stdout {}
    }

given my logstash.conf; I think I would need to set the parsing to INT on my filter { }, so something like change

        columns => [
            "queryDate",
            "region",
            "datatype",
            "count"
        ]

to

        convert => {
            "queryDate" => date_time,
            "region" => string,
            "datatype" => string,
            "count" => integer
        }

hope I am on the right path

thanks again for any help

flash1293 · March 31, 2020, 8:02am

That approach sounds good. You can verify by checking the mapping (GET /my-index/_mapping) - it should list the count field as a number type (e.g. integer).

If that's the case, refresh the index pattern in Kibana (the reload button in the top right on the index pattern management page) for Kibana to pick up the changes, then the count field should show up when selecting the sum aggregation.

apdpsudx · March 31, 2020, 12:35pm

Awesome got it working, ultimately I just added

convert => {
            "count" => "integer"
        }

(for other readers, no comma's and quote the value!)

one last question Flash if you can - in the visualization - if I am doing a vertical bar chart -
X-axis is the histogram over @timestamp, for my Y-axis I wanted to do a stacked bar chart on the data

so let's say on the data set:

2020/03/01 10:00,US,clients,10
2020/03/01 10:00,US,vendors,12
2020/03/01 10:00,US,warehouses,3
2020/03/01 10:00,CA, vendors,10
2020/03/01 10:00,CA,clients,10
2020/03/01 10:00,CA, warehouses,10
2020/03/01 10:05,US,clients,10
2020/03/01 10:05,US,vendors,12
2020/03/01 10:05,US,warehouses,3
2020/03/01 10:05,CA, vendors,10
2020/03/01 10:05,CA,clients,10
2020/03/01 10:05,CA, warehouses,10

I want to stack by each "type" (column 3), per time series; do you have any pointers? when doing the setup in Kibana

I set up the X-axis as
first layer: split series on date histogram
second layer: sub aggregation on term on field "type"

this stacks its; but the time series moves to the right side Y-axis and the X-axis becomes the value of type.

thanks again!

flash1293 · March 31, 2020, 2:50pm

It sounds like you want the following configuration:

First bucket agg: Split series by term on field type
Second bucket agg: X axis date histogram on timestamp

apdpsudx · April 1, 2020, 6:19am

great thanks so much Flash. I was able to get this all working.

one final thing.
I am seeing the following error now on my dashboard:

"Request to Elasticsearch failed.  {"error":{"root cause":"Too_many_buckets_exception", "reason":"trying to create too many buckets.  must be less than or equal to 10000..."

so I checked my elasticsearch shard size I have 1 shard/1 replica (default settings). I am going to try to increase my shard size (to 8k? from reading some other forums).

thanks so much for your help again!

flash1293 · April 1, 2020, 9:30am

This is not related to the shard size, it's about an aggregation which is very expensive because it creates too many individual buckets. Try using "Auto" for the interval or a smaller time span.

apdpsudx · April 2, 2020, 2:16pm

looks great thanks to your help Flash.

All the best.

system · April 30, 2020, 2:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana display usernum problem? Elasticsearch	5	378	July 6, 2017
Metric aggregation visualization via kibana Kibana	2	645	July 6, 2017
Multiple Dimensions Analysis on aggregated dataset in Kibana Kibana	2	849	April 16, 2018
Kibana dashboard for logstash metrics Kibana	2	860	July 6, 2017
Elasticsearch query on kibana Kibana	3	1148	July 6, 2017

Data is already aggregated in logstash feed, how to create a Kibana UI on this aggregated metrics

Related topics