Any way to graph this?

I'm storing this data on Logstash and trying to find a way to create a visualize it with Kibana.

{
                                          "@timestamp" => "2016-07-13T21:34:02.123Z",
                                                "type" => "vc_server",
                                            "hostname" => "agslx-hpclavc05",
                                             "/pw/sid" => 12,
              "/static/CLAClarinV3/images/logo-st_png" => 3,
              "/destacados/externos/dx_zonales_2_html" => 3,
       "/static/DESGigyaConnect/images/mail_share_png" => 3,
    "/static/DESGigyaConnect/images/twitter_share_png" => 3
}
{
                                                                 "@timestamp" => "2016-07-13T21:35:02.128Z",
                                                                       "type" => "vc_server",
                                                                   "hostname" => "agslx-hpclavc05",
                                                                    "/pw/sid" => 13,
                                     "/destacados/externos/dx_zonales_2_html" => 7,
    "/static/CLAClarinV3/js/hScroll_js?hash=15f6112d2b9a0f1d1ca3d5c09c9bdf8b" => 3,
                                                                          "/" => 2,
                                    "/static/DESClima/images/v3/chicas/6_png" => 2
}

The thing is that those field names are most of the times different since it's the most amount of requests done to a backend on our varnish servers.

Is there a way to tell Kibana to graph only the top 3 fields based on the value of the fields?

For example, graph if value > 3 then only "/pw/sid" and "/destacados/externos/dx_zonales_2_html" would be graphed.

That's going to cause a huge mess. If you're storing your data that way, you're basically not going to be able to use it in any real way. Elasticsearch can't tell you what the top 10 fields are, and so neither can Kibana...

You need to change how you're storing that information. What are the numbers on those fields, the number of requests to that path or something? How do you have multiple values with a single timestamp? The only thing I can envision is that you are recording the top 5 requested paths every X seconds... is that right?

I see, what would you recommend storing them like? The number on those fields are the amount of requests to that URL in the last 60 seconds.

I have multiple values with a single timestamp because I'm using multiline althought it isn't working as I intend to but maybe I should ask for help with that on the logstash section.

If you're just trying to log and count requests, your best bet is to log every request that comes through Varnish, and let Elasticsearch handle the counting and time ranges and such for you. It can do all the slicing and dicing for you, that's its whole value. :slight_smile:

So, for every request, you create a new document, with a timestamp, type, hostname, and path (what are currently your values, ie /pw/sid). You can also grab other information if you'd like, such as the originating IP, headers, etc. The more types of data you collect about the request, the more insights you can get from it all.

Once you're storing things that way, then you can ask Elasticsearch, via Kibana or whatever method you want, what the top X requests were. You can filter by time ranges and servers, bucket by specific time intervals, and all the stuff you're trying to do.

I should note that the mappings (ie. the left side of the document) are now fixed, and Elasticsearch can run these queries very quickly. It also allows you to correctly size your nodes and clusters and scale your data.

Ok with no filter the data I get looks like this:

{
    "@timestamp" => "2016-07-13T22:11:27.363Z",
          "beat" => {
        "hostname" => "AGSLX-HPCLAVC05",
            "name" => "AGSLX-HPCLAVC05"
    },
         "count" => 1,
        "fields" => nil,
    "input_type" => "log",
       "message" => "/static/DESGigyaConnect/images/add_share.png 3.00",
        "offset" => 466,
        "source" => "/home/gdobboletta/varnish.log",
          "type" => "vc_server",
      "@version" => "1",
     "parameter" => "/static/DESGigyaConnect/images/add_share_png",
         "value" => 3,
      "hostname" => "agslx-hpclavc05"
}

There I have the parameter:value pair. How can I ask Kibana to show me the "parameter" field based on the biggest "value" ?

edit: This is what I mean Image

Don't forget to refresh you mappings in Kibana. And don't index these new documents in the same index, otherwise the data is going to look very strange. Either delete the index, or create a new one to start writing to (and add that new index pattern to Kibana).

If you want to create a visualization, use a Date Histogram on @timestamp on the X-Axis. If you want to look at a specific set of time intervals, like every hour, adjust the interval to match what you want. Next, add another bucket to split the bars (or lines or area) and use a Terms aggregation on the parameter field to. By default it'll show you the Top 5, but you can adjust this.

Now click the play button at the top.

The first thing you'll notice in the visualization is that you'll have more than 5 (or whatever you set it to) items in the legend. This is by design, as it's showing the Top X for each bucket (@timestamp interval you set, or the one it picked in Auto mode).

If you're just trying to see the top requests for the last hour, for example, set the interval to 1 hour, and set the time range in the timepicker (top right of Kibana) to cover at least the last hour, or the hour you actually care about.

Ok I've followed all of your advice and created a separate index for this kind of data. I tried graphing the data as you suggested but the terms aggregation is not giving me accurate results. On the right I see that the strings since they're URLs are decomposed as several instead of only one.

Here:
Kibana Image

Oh, whoops, I probably should have seen that one coming.

The problem is that the field is analyzed, which means that Elasticsearch is analyzing the string and break it up into multiple strings. So, for example, /static/DESGigyaConnect/images/add_share_png becomes a collection of the strings: static, DESGigyaConnect, images, add_share_png.

You'll need to adjust your mappings and re-index again. Based on the one document you posted, I'm pretty sure you actually don't want any of your string fields to be analyzed in the mapping definition.

At the very least, you probably want something like this:

mappings: {
  vc_server: {
    properties: {
      hostname : { type: "string", index: "not_analyzed" },
      message: { type: "string", index: "not_analyzed" },
      source: { type: "string", index: "not_analyzed" },
      properties: { type: "string", index: "not_analyzed" },
    }
  }
}

Once you've defined the mappings, you'll want to remove your new index in Elasticsearch and re-index that the records.

I've deleted the index and recreated it with the correct mapping but everytime I run logstash again it readds the mappings. Here's an example:

Pastebin

I've tried it a few times and yet logstash creates the mappings again. All I have in my output is this:

output {
        elasticsearch {
            index => "agslx-hpclavc05-varnishtop"
            hosts => ["localhost:9200"]
        }
}

Ok I got it working perfectly except the "size" part. If I set it to 1 it shows only one request and works fine but If I set it to 3 for example it shows more than 3. Is there any reason why this happens?

example

Yup, this is what I was talking about earlier. It's actually showing you the top X (1, 3, whatever) per bucket. That is, per time interval. The fact that you only see 1 result when you choose Top 1 just means that the #1 path doesn't happen to change in your existing dataset, not that you are only ever going to get 1 result.

You could change the order of the buckets though, and you will get the limit you expect. That is, split the bars/lines/area first, with a Top X limit, and then add the Date Histogram on the X-Axis. This will show you only the top X values over whatever time range you are using in Kibana.

There's a tradeoff here though; if some buckets/time intervals don't contain one of the top X values, then you won't see all of the values for that interval; that is, you may see X-1 or less values on that bucket. The other, perhaps more serious tradeoff, is that some values may not show up even though they far outnumber the others.

Let's say you are looking for the top X parameter values over the last 24 hours, grouped by hour. Elasticsearch queries that data in order, so it will actually query your data first over the last 24 hours first to get the top X values, and then show how those values break up for each hour. However, let's say that in the timeframe, you have 1 hour were a single parameter was getting hammered and saw a TON of requests (say, it was a DoS attack or something)... but that only happened for an hour, or even a few minutes. In the grand scheme of things, over those 24 hours, that value is insignificant. It's probably important that you know about it, but it doesn't land in the top 5 for the last 24 hours, and won't show up at all.

That's the tradeoff. Do you only want to see the top X values over the entire time range, or do you want to see the top X values for each time bucket. Depending on what your answer is, you'll want to order the bucket configuration values on the visualization differently. Of course, you may decide that both are important, in which case you should create 2 different visualizations, and add both to a dashboard, so you get a better overall picture.

I hope that makes sense, but if anything still isn't clear, let me know.

Now I understand it completely. Thank you SO much for the detailed explanation, the behavior it has now is EXACTLY what we needed. Your explanations are so clear even for someone whose main language isn't english and just starting with the ELK stack, really appreciate it!

1 Like