ELK vs grafana+influxDB


(Yehosef) #1

We are starting to move our logging from MixPanel and SQL to elasticseach+Kibana. In the process we've discovered Grafana and InfluxDB (alias G/I) and it looks very nice. Grafana seems much more sophisticated than Kibana and InfluxDB also looks very flexible and promising.

I'm looking for some insight on where are the strong points where ELK outshines G/I. I could understand if we needed any text anaylsis - but currently we don't (we separately are using ES for search). We're just using this system for analytics.


Elastic search for system metrics
ElasticSearch as metrics database
(Robin Moffatt) #2

I use both side by side, for different purposes. Grafana + InfluxDB for purely time series metrics (specifically, monitoring applications & servers), and ELK for monitoring/diagnostics against log file sources etc. My impression was you can work with pure time series metrics in Elasticsearch but they sit much more naturally in InfluxDB. Right tool, right job.


(Yehosef) #3

@rmoff - thanks for the info. Interesting to hear. What we're looking into is to supplement mixpanel and give us business analytics information. Eg. How many logins and how does that compare to time. Kibana is very limited in what we can do with more sophisticated questions - eg, plot against 30/60 day average, compare today to average on this day of week for last month (some days naturally have more traffic), what is the percentage of sessions that did a certain actions (the percentages in kibana line charts doesn't seem to do this natively, AFAIK). And of course, I want to be able to see filter these trends based on referrer or device or many other questions.

I don't know if Grafana will answer these questions either - but it seems to support more options out of the box of moving averages or comparing chart to another chart, multiple timeframes per dashboard, etc.

Do you have ideas if one of these tools is up to these kinds of questions, or if there are other tools that might be better?
thanks,


(Robin Moffatt) #4

Yes I would say definitely Grafana here rather than ELK. Or to be more
precise, InfluxDB, which supports the kind of aggregations you're talking
about.


(Yehosef) #5

Thanks for the info - do you have any references of people using grafana/influxdb for these problems. In the end, we think we have know how the get the answers we need from elasticsearch (by doing other aggregations and analyzing those), the problem is mainly that we need visual that and Kibana seems limited.

Our options are:

  1. extend kibana to have it do what we want (we're comfortable with angular so this is a real option)
  2. build our own interface to talk with elasticsearch and manually building the aggs according to what we need
  3. moving to another stack like grafana and influx (assuming it'll solve these problems with less work.)

So far on the analysis side we're happy with elasticsearch and I'm hesitant to move to something else because I don't know what I'm sacrificing. Do you know what are the limitations of influx compared to elasticsearch?

Thanks!


(Yehosef) #6

FYI - looks like moving averages are coming - https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline.html


(Elvar) #7

I agree with Robin here, use the right tool for the right job.

Logs go to Elasticsearch
Metrics to InfluxDB

But you can easily combine the two and have Logstash send metric data to InfluxDB. There is also a feature request in Grafana to add support for Elasticsearch.

I did a comparison of the two before I decided to use InfluxDB+Grafana, storing the same amount of metrics in Elasticsearch takes at least 10x the amount of space than it does in InfluxDB.

Just use both for now, both are excellent.


(Yehosef) #8

@elvarb - thanks for the info. I'm a little confused as to the differences between logs and metrics. The way I understand it, logs make metrics. If you cut out all the detail to one number per time frame (eg, logins per day) then that will be small and fast but you've lost much of the detail in being able to slice and dice it (eg how does that break down by traffic sources or visitor type, actions done, etc.)

And if I want to condense it down into one number like that for ELK, I can create metrics from myself or in logstash (https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html).

Do you have more detail about what's considered logs and what's metrics in the BI context?

Also - for your storage comparison, did you tweak your mapping? We're using https://gist.github.com/yehosef/f96dc491bcd5ee9bf7d3#file-template-config and we've got 250M rows on a 16G machine with about 70GB data (originally about 300-400GB raw json). I don't know how this would compare with influx - do you have some specific numbers for comparison.

Outside of moving averages, and storage requirements, do you have specific things that influx+grafana can do that EK doesn't?


(Mark Walkom) #9

You're right, it's simple to grab metrics from many application logs, however somethings like lower level OS metrics aren't that simple to get. We're definitely aiming to cover the latter with [Beats](https://www.elastic.co/products/beats].

However I currently monitor my VPC with collectd+ELK, it's probably overkill for a 2vCPU/2GB VM, but it works a treat!

Just on the tweaked mapping, disabling _source means you cannot reindex your data, that may not be a problem for you. Disabling _all means you lose your search shortcut and you need to be specific on what field to search, that may not be a problem if you are doing metric level dashboards, but it won't be best for all uses.


(Tanya Bragin) #10

Agreed with "right tool for the right job". While Elasticsearch is a more general solution, compared to time-series databases, it may make sense for certain time-series workloads, and we hear many reports of our users using it as a time-series DB.

One case in point, here is one recent independent evaluation from CERN that was presented at 21st International Conference on Computing in High Energy and Nuclear Physics in April 2015: http://cds.cern.ch/record/2011172/files/LHCb-TALK-2015-060.pdf

They compared Elasticsearch to InfluxDB and OpenTSDB and found that Elasticsearch scaled better than InfluxDB and OpenTSDB for their high-scale time-series analytics use case.

See detailed performance benchmarks in the paper.


(Yehosef) #11

@warkolm - about OS metrics - that's just issues about the reporting. there are plenty of ways of getting that information and if graphite has that built in, it'll be easier. But that's orthogonal to elasticsearch and influxDB.

About the mapping tweaks - you're right. You can't reindex and you can't do global searches. But, unless you're doing "logging", that doesn't really matter. If you use the standard mapping every string field will be analyzed (lowercase and tokenized, IIRC) which will add a huge amount of space. And unless that's what you want, it makes many of the typical analytics work actually more difficult. Eg. any field like "keywords" become unusable unless there are no spaces. I think it would be helpful if Elastic published more recommendations for template mappings based on different use-cases. The main problem we have with our mapping is that we can't use the "discover" part of kibana because those are not exposed. We're planning on keeping a small subset of the data (last 30 days) with traditional mapping in additional to our main "optimized" indices. This will let us use "discover" to build queries, and then run then on the other indices.

The question I'm trying to understand and answer is, what does "The right tool for the right job" mean in this context. @elvarb said "Logs go to Elasticsearch and metrics to infuxDB" But I'm still trying to understand what influxDB would give us that Elasticsearch doesn't, especially if you understand how to tweak ES.

@tbragin, Thanks for the link to the CERN paper. It's interesting that they couldn't get influx to scale.

Currently the most compelling point for me for influxdb is grafana. But I'm still trying to understand more about it's strong points in comparison to ES.


(Mark Walkom) #12

Great idea regarding mapping use cases, I'll take that back internally and see if we can get some material out on it.


(Elvar) #13

There are a few factors on why I think going with InfluxDB for metrics is better than using ELK (for now, might change with Elasticsearch 2.0, and that the CERN paper made me have doubts as well)

  1. The whole ecosystem around Graphite metrics is huge, you can pick from loads of tools to gather the metrics, loads of tools to aggregate the metrics, loads of tools to store the metrics, loads of tools to view the metrics and loads of tools to monitor the metrics.
  2. Of all those options InfluxDB solves the aggregation and storage in a very easy to use and easy to manage package. Most other options require more work to get working and to mantain.
  3. Grafana is in my opinion by far the best metric visualization tool available, works on Graphtie, InfluxDB and OpenTSDB. There is an open ticket on the Grafana project about adding Elasticsearch support as well but I doubt they will start work on that untill Elasticsearch 2.0 is released.
  4. By using the Graphite format you can replace nearly every piece with a different solution so its very future proof. Metrics 2.0 (basically context support with tags) will be a game changer for everyone but its too early to worry about that now, only a handful of solutions support it.
  5. Storing metrics in InfluxDB takes a lot less space than in Elasticsearch. Think about how the Graphite format works, you have a namespace of "sitename.hostname.appname.subname.metricname", a value and a timestamp. In metrics databases that namespace is stored once and the value and timestamp for each data point. In Elasticsearch you would have to store it all for each data point + you would have to analyze the namspace field so you can query it.

In the end, do a test. Pick a metric, send it to both ELK and InfluxDB for a week. Evaluate the disk space usage of both. Test viewing the metrics in Kibana for ELK and Grafana for InfluxDB. This will give you a solid feel for all angles.


(Yehosef) #14

thanks - some feedback about your points:

  1. The ecosystem around ELK is shaping up pretty rapidly - but it could be you're right that there is more tooling around graphite now.

  2. Doesn't say how influx is better than elasticsearch - I've read of people building TB elasticsearch clusters - I'm not sure what examples of bigger influx clusters there are.

  3. Agreed. though I think Kibana has tremendous potential and I hope to see them develop it.

  4. could be - not a factor for us.

  5. It's not clear to me how true this is if one optimizes the index mapping, as I mentioned. Also - you don't analyze the field names - just the field contents. If your values are just numbers, there is no analysis. And the field names will be column pointers, if understand correctly, so you're not paying the price for the field/column name like you do in mongo. To get the possible field names, you query against the mapping. someone correct me if this is incorrect.

As an example of sizing - we have data that looks like http://jsonblob.com/55952b0ae4b051e806c87aa1 the index for one day is 373MB for 1.9M records (approx 200B per record and there are about 34 fields so thats about 6 B per field) The advantage is that we can use this both for simple metrics (show me the number of mobile visitors I had by hour on this day) and more complex aggregation (show me the number of mobile visitor by top cities aggregated by browser name by hour on this day). And I can get this from one data source without knowing from the beginning that I want those numbers.

It's possible in influx you could also do this - I'm not sure. But I don't think graphite can. But this is what pulls me towards elasticsearch - the ability to ask and answer questions that I didn't think about when I stored the data.

In the end, you're right. the best approach is to try both tools and see which works best for you. Since ES2 is around the corner - we're probably going to hold off for that since it'll take care of some of the more complex aggregation problems (moving averages..) But thanks for your insights - it's valuable to hear all the details.


(Clinton Gormley) #15

You are correct. And optimizing the index mapping is definitely the way to go. We ship with defaults that try to make things work out of the box for the new user, but spending some time understanding what you're indexing, how it is indexed, and whether or not you need it would be time well spent.

For the pure metrics use case, disabling _all and _source will save you a significant amount of space, but with the disadvantages that Mark pointed out above: not being able to query a catch-all field, and not being able to reindex your documents. As you've said, these things are not important for the metrics use case.

(Note: as an alternative to disabling _source, 2.0 allows you to choose between faster and better compression, which can be updated on the fly.)

Doc-values by default is also the right way to. In fact it is the default for all fields in 2.0 (except analyzed strings, which are not supported).

The default logstash template adds an analyzed and a not_analyzed (raw) version of every string field, because we can't deduce up front which strings should be treated as keywords and which should be searchable as full-text. Again, this makes things just work out of the box, but it is not optimal.

Choosing the type of string field that you want up front is an easy optimization to make, as long as you know what your documents look like in advance. The two approaches can be combined: add specific mappings for the fields you know about, and rely on dynamic mappings for any fields you introduce later.

Couldn't have put it better myself :slight_smile:

++

I think pipeline aggs are going to transform the types of analytics you can do in Elasticsearch. For those of you not familiar with pipeline aggs: they add the ability to aggregate on the results of other aggregations. For instance, you can:

  • generate a date histogram of the max total new visitors per day, then pipe that into a derivative to see how many were added each day, then pipe that into another derivative to see the growth rate of your user base.
  • use moving averages to smooth your data so that you can see general trends instead of noisy data
  • use moving averages to calculate the 30/60 day average, and use Holt Winters to predict your future 30/60 day averages
  • use bucket scripts to produce a new metric based on one or more other series, eg to calculate the percentage of sessions which performed a particular action
  • use serial differencing to remove seasonal or weekly trends
  • etc...

We've focused on the most important pipeline aggs for now, but we'd rather get real user feedback about what is missing, rather than just implementing a bunch of fancy stuff which may not be useful in the wild.

I haven't followed Grafana, which was originally a fork of Kibana 3. What features have they added which we should be adding to Kibana?


(Elvar) #16

The example data you provided is an example of data that should belong in ELK.


(Elvar) #17

I haven't followed Grafana, which was originally a fork of Kibana 3. What features have they added which we should be adding to Kibana?

http://play.grafana.org/

Have fun :smile:


(Yehosef) #18

You can look at their features and playground to see the differences. I have not spent much time looking at the differences personally because it's irrelevant for us now. Grafana doesn't support elasticsearch and we've decided to build our analytics on elasticsearch. But we're currently planning on building our own visualization layer. I'd be happy to share it with you and your team - maybe you can start to move kibana in this direction and we'll all benefit.


(Yehosef) #19

Even though I said we're not looking at grafana - I decided to spend some time with it because we're planning on building our own visualization solution for our problems. And wow, grafana is very nice.

I can see why people say logs to elasticsearch and metrics to influx/graphite - right tool for the right job. I didn't understand it before but now I do.

But it's not exactly how it sounds. The right tool for the right job doesn't have to do with the database/store - but the visualization. Metrics should go to grafana (currently) - it is much more sophisticated for viewing metric type of data. Logging should go to Kibana - grafana wouldn't have any interface for showing or exploring that data.

But there is really nothing I can see on the backend why metrics should go to influx over elasticsearch (once you know how to optimize indices). The issue is that once you put metrics in elasticsearch, you can't analyze them or create dashboards as well as you can in influx.

Hopefully kibana become more flexible and powerful in upcoming releases - elasticsearch deserves it :smile:


(Mark Walkom) #20

This has me confused, before you were saying ES can do this, now you mention it cannot?