What Logstash Should be

After being exposed to Logstash in my current role, I thought I would share with you what I think Logstash should be, and maybe I don't know enough about the product, but the focus should be in making the product better.

Configuration

This is perhaps one of the biggest areas I can see for improvement in regards to Logstash. If you have a complex log, this becomes a task of days/weeks rather than simple hours. If you are reading a simple HTTP log, that is pretty straight forward. Testing of the configuration is a complex array of involvement of deleting elastic search data, resetting the feeder and running the test again to see if things worked, all very time consuming work, just to check a simple configuration, and whether or not the configuration is going to work or not.

Ideally you would have a configuration that you could either run the specific rule again the data, and a simple output to check that the data output was correct.

Further Improvement

This is where I would probably like to suggest some enhancements :

  1. Ability for a webinterface for the creation of logstash rules. i.e you have a string of a logfile and you just highlight the bits that you want, and it automatically determines the configuration. A simple click test of the entire configuration against a set of data, then you know you have the rules right.

  2. The ability to rerun new matching rules against old data. Sometimes you don't log everything, and would be good to have the ability to rerun a new rule against old data.

Anyhow, thought would just put up some words for thought.

Cheers
Jace

You can do this with a simple stdin, your filters and stdout { codec => rubydeug }. But your overall point is taken.

Regarding the improvements, we are working on making the config API driven which should help. There is also community tools like http://grokconstructor.appspot.com/ that may help.
Rerunning the rules against already indexed data is tougher though. It's something we're working on making easier internally to ES though, which may help LS in the longer term.

But thanks for the feedback, please keep it coming :smiley:

Yeah, have found using that tool better than nothing. :slight_smile: Thanks Mark, at least know I am on the right track.

I have previously written my own monitoring applications, having looked at the market previously and along the way learnt a lot of things in regards to what makes a good monitoring system. It makes it hard for logstash, but would be great to be database driven, in regards to the rule sets. Previously, had developed a system where you very simply just highlighted parts of the string and told it what the various elements of the string were. Then that was it really, you just saved it, and when it processed a line of data, it determined what rules it needed to run.

Kibana is a different story, find it looks pretty, but don't find it extremely flexible. The main focus I have always had is one page and you should be able to see everything about the system.

Cheers
Jace

I have previously written my own monitoring applications, having looked at the market previously and along the way learnt a lot of things in regards to what makes a good monitoring system. It makes it hard for logstash, but would be great to be database driven, in regards to the rule sets. Previously, had developed a system where you very simply just highlighted parts of the string and told it what the various elements of the string were. Then that was it really, you just saved it, and when it processed a line of data, it determined what rules it needed to run.

I'm not sure I understand why storing the rules in a database would have a positive impact on anything.

Kibana is a different story, find it looks pretty, but don't find it extremely flexible. The main focus I have always had is one page and you should be able to see everything about the system.

So... build a dashboard with multiple charts that cover the aspects you're interested in?

There are several reasons.

  1. Through an interface you could configure the logstash configuration. This is extremely useful if you have multiple developers, who as part of the development process are also responsible for metrics. This is something that really lacks in most systems, and generally monitoring occurs as an after thought. By segmenting parts of the matching process, you then also allow testing of the rule. Matching rules become part of the process, and are stored in a database. Also, as new parts of interest in a log file appear, you don't have to restart and bother about the configuration, you just get a line that you want to match, create the match, and then it stores it.

  2. Real time update to rules without a restart, and has automatic pretest of configuration. You could also then do in line and analysis of the log files. All log files become centralised, and then you could have a similar thing to a grok real time being displayed on a page dependent on if the line matches.

  3. Central configuration of the rules. Configuration files are nasty, and can become very big very quickly.

The mantra behind any monitoring system should be, "one page all info", and it shouldn't be difficult to build the ability to have multiple graph types working together, to effectively overlay them.

The first attractant is that Kibana looks "pretty" until you discover that really it isn't that functional or useful, and that you have to add several graphs to get something that you want. The trouble is, upstream management expect something awesome, but all you can deliver is something that is even less that what you could do in Excel 20 years ago. If you look at other graphing packages available, you soon realise how rudimentary it is.

This is perhaps one of the biggest areas I can see for improvement in regards to Logstash. If you have a complex log, this becomes a task of days/weeks rather than simple hours. If you are reading a simple HTTP log, that is pretty straight forward. Testing of the configuration is a complex array of involvement of deleting Elasticsearch data, resetting the feeder and running the test again to see if things worked, all very time consuming work, just to check a simple configuration, and whether or not the configuration is going to work or not.

Ideally you would have a configuration that you could either run the specific rule again the data, and a simple output to check that the data output was correct.

I recently published a new tool, Logstash Filter Verifier, that basically allows you to write unit tests for your filters. You define sample input events and what you expect to get back from Logstash and LFV starts up Logstash with all your filters and feeds it these events and verifies that it gets back what you expected. I've found it to be a huge productivity helper and a safety net prior to deploying a new filter configuration.

Through an interface you could configure the logstash configuration. This is extremely useful if you have multiple developers, who as part of the development process are also responsible for metrics. This is something that really lacks in most systems, and generally monitoring occurs as an after thought. By segmenting parts of the matching process, you then also allow testing of the rule. Matching rules become part of the process, and are stored in a database. Also, as new parts of interest in a log file appear, you don't have to restart and bother about the configuration, you just get a line that you want to match, create the match, and then it stores it.

But none of this requires Logstash to have a database as its native configuration store.

Real time update to rules without a restart, and has automatic pretest of configuration. You could also then do in line and analysis of the log files. All log files become centralised, and then you could have a similar thing to a grok real time being displayed on a page dependent on if the line matches.

Again, real time updates have nothing to do with database-based configuration stores.

Central configuration of the rules. Configuration files are nasty, and can become very big very quickly.

Well, I simply don't agree with you. I've yet to see a database-based configuration store that isn't opaque, clumsy, and requires special tools and APIs (thereby making it less suitable for automation).