Data is already aggregated in logstash feed, how to create a Kibana UI on this aggregated metrics

Hi All,
I am a bit new to ELK and have been researching this for a bit, and most forums and links I found indicate this is not possible. Happy if there is a thread open/closed for me to review as well.

Scenario:
Running Kibana 7.1.1 on RHEL7
I have a python script that is query several databases and doing counts and then logging this into a csv.
Example CSV output: (timestamp,region, datatype,count of rows)
2020/03/01 10:00,US,clients,10
2020/03/01 10:00,US,vendors,12
2020/03/01 10:00,US,warehouses,3
2020/03/01 10:00,CA, vendors,10
2020/03/01 10:00,CA,clients,10
2020/03/01 10:00,CA, warehouses,10
2020/03/01 10:05,US,clients,10
2020/03/01 10:05,US,vendors,12
2020/03/01 10:05,US,warehouses,3
2020/03/01 10:05,CA, vendors,10
2020/03/01 10:05,CA,clients,10
2020/03/01 10:05,CA, warehouses,10

what I'm trying to achieve is that for kibana users plot the Data which was already aggregated - each datatype by count (X-axis) over the time series (Y-axis)

I see my data correctly when querying via kibana UI; however when I try to create the metric on the kibana dashboard the obvious maps-aggregations, but I just want to plot the actual values.

Any suggestions on how to do this? should I change how I input?

I don't want to log all data, we are talking billions and by the end of the day, trillions of rows across all databases which is why I am aggregate during data fetch as opposed to logstash fetching and letting kibana. my hardware is 2x 768gb memory 72core 2TB ssds. So I do have compute power but don't want to waste disk.

It's fine to store pre-aggregated data in Elasticsearch and it should be possible to use Visualizations to show these individual data points just fine - it's just a special case of aggregating where just a single value goes into each aggregation.

For your case, create a line chart with a date histogram on timestamp on the x axis (with minimum interval of 5mins) and sum of count field on the y axis - in most cases (depending on your time range) there will be just a single value per bucket, so it's just calculating the "sum" of a single value which is the value itself and effectively just plotting the individual values.

thanks Flash!

looking at my Y-axis; when I set Sum under Aggregation I get only "geoip.latitude/longitude". poking around a bit I gather that maybe my ES has wrong mapping for my index - my log stash config is:

    input {
        beats {
           port => 5044
        }
        file {
            path => /my/raw/data.csv
            type => order-data-by-region
            exclude => "*.gz"
            start_postion => "beginning"
            sincedb_path => /var/tmp/since.db
        }
    }
    filter {
        csv {
            columns => [
                "queryDate",
                "region",
                "datatype",
                "count"
            ]
           separator => ","
           remove_field => ["message"]
    }
    output {
        elasticsearch {
            hosts => ["http://localhost:9200"]
        }
        stdout {}
    }

given my logstash.conf; I think I would need to set the parsing to INT on my filter { }, so something like change

        columns => [
            "queryDate",
            "region",
            "datatype",
            "count"
        ]

to

        convert => {
            "queryDate" => date_time,
            "region" => string,
            "datatype" => string,
            "count" => integer
        }

hope I am on the right path

thanks again for any help

That approach sounds good. You can verify by checking the mapping (GET /my-index/_mapping) - it should list the count field as a number type (e.g. integer).

If that's the case, refresh the index pattern in Kibana (the reload button in the top right on the index pattern management page) for Kibana to pick up the changes, then the count field should show up when selecting the sum aggregation.

Awesome got it working, ultimately I just added

convert => {
            "count" => "integer"
        }

(for other readers, no comma's and quote the value!)

one last question Flash if you can - in the visualization - if I am doing a vertical bar chart -
X-axis is the histogram over @timestamp, for my Y-axis I wanted to do a stacked bar chart on the data

so let's say on the data set:

2020/03/01 10:00,US,clients,10
2020/03/01 10:00,US,vendors,12
2020/03/01 10:00,US,warehouses,3
2020/03/01 10:00,CA, vendors,10
2020/03/01 10:00,CA,clients,10
2020/03/01 10:00,CA, warehouses,10
2020/03/01 10:05,US,clients,10
2020/03/01 10:05,US,vendors,12
2020/03/01 10:05,US,warehouses,3
2020/03/01 10:05,CA, vendors,10
2020/03/01 10:05,CA,clients,10
2020/03/01 10:05,CA, warehouses,10

I want to stack by each "type" (column 3), per time series; do you have any pointers? when doing the setup in Kibana

I set up the X-axis as
first layer: split series on date histogram
second layer: sub aggregation on term on field "type"

this stacks its; but the time series moves to the right side Y-axis and the X-axis becomes the value of type.

thanks again!

It sounds like you want the following configuration:

  • First bucket agg: Split series by term on field type
  • Second bucket agg: X axis date histogram on timestamp

great thanks so much Flash. I was able to get this all working.

one final thing.
I am seeing the following error now on my dashboard:

"Request to Elasticsearch failed.  {"error":{"root cause":"Too_many_buckets_exception", "reason":"trying to create too many buckets.  must be less than or equal to 10000..."

so I checked my elasticsearch shard size I have 1 shard/1 replica (default settings). I am going to try to increase my shard size (to 8k? from reading some other forums).

thanks so much for your help again!

This is not related to the shard size, it's about an aggregation which is very expensive because it creates too many individual buckets. Try using "Auto" for the interval or a smaller time span.

looks great thanks to your help Flash.

All the best.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.