The best input option for Logstash

Hi there,
Our team is working on creating some instant messaging applications.

We would like to gather logs from our applications with logstash.

As we are developing, we can use whatever technique to produce/store logs before logstash process them (we can write them to a file, store in a DB, pass over TCP etc.)

We can use different formats. It may be json, plain text, key-value pairs and so one.

What would be the best way to produce/store logs for subsequent processing in logstash to make the work for logstash as easy as possible?

I'd like to avoid using grok (because it CPU consuming), multiline filter (because I had a problem with it in the past, which still not resolved) and other CPU consuming plugins.

It'd be great if we can have some human-readable and logstash friendly format in one place, so that it is easy to read for support team and at the same time it is easy to process for logstash, but if there is now such option we can use several output options from our applications: one for human-support for in-place inspection and other for logstash.

Could you please recommend some log formats and backends?
Best regards,
Yegor

Produce JSON logs for Logstash consumption and let them contain all data that your logging framework can muster (filenames, line numbers, whatever). If you want human-readable logs on top of that you can produce plain text files in whatever format people prefer.

Thank you for your reply.

So, do I get you correctly? I use text file to write my logs to it in json format? Json object will be one line per event. All json object fields will be transferred to logstash output json without changing.

In this case input part of config for logstash may look like this:

input {
    file {
        codec => "json"
        delimeter => "\n" # keep default
        discover_interval => 15 # keep default
        exclude => "*.gz" # for example
        path => "/etc/app/log/*" # lets say we have /etc/app/log/access.log
        sincedb_path => "$HOME/.sincedb*"
        sincedb_write_interval => 15
        start_position => "end"
        stat_interval => 0.1
        tags => ["easy_to_process"]
        type => "access_log"
    }
}

And the log file format will be something like the following?

{"method":"GET","path":"/","format":"html","controller":"welcome","action":"index","status":200,"duration":194.41,"view":184.79,"db":0.0,"time":"2015-10-18T13:35:42.757-07:00"}
{"method":"GET","path":"/users/sign_in","format":"html","controller":"users/sessions","action":"new","status":200,"duration":103.95,"view":88.8,"db":5.12,"time":"2015-10-18T13:35:46.873-07:00"}
....

And what about file rotation. If the logstash doesn't fast enough to process all events in file before it rotate some events will be lost, right? How to avoid it?

Or I can use another backend with json format?

And I also can use logstash-forwarder on the application server? right?
Will the config looks like this?

lumberjack {
    port => ...
    ssl_certificate => ...
    ssl_key => ...
    codec => "json"
}

Thanks a lot for your help.
Best regards,
Yegor

And the log file format will be something like the following?

Yes.

And what about file rotation. If the logstash doesn't fast enough to process all events in file before it rotate some events will be lost, right? How to avoid it?

I think this depends on how the files are rotated. Rotations via renames should be fine but copy & truncate rotations are probably liable to loss of messages.

Or I can use another backend with json format?

Not sure what you mean here.

And I also can use logstash-forwarder on the application server?

Sure, you can use any shipper.

Not sure what you mean here.

Sorry, maybe I'm not using the right word for it.

By another backend with json format I mean that maybe I can use other storage type for my logs than a text file.

Maybe I can put my events from app directly to redis or sqlite or something else, and then pull them from logstash.

Is the file input is reliable enough to use it?

My concern is that if I'm using file as a storage type for logs on my app's servers I will end up with using some transfer tool to ship logs to centralized log server. If, let's say, it will be logstash-forwarder, it will threat each line of log file as the event and will add additional fields (like "file", "host" etc.) to the event before send it to logstash. So the initial json line may be wrapped into "message" field. Will logstash handle this right (if my assumption is correct)?

Could you also say some words about "s3" input type? Maybe it has some known issues? As I understand the mechanism here is similar to "file" input type, but the file is in the s3 bucket and I only have to provide additional info to make it accessible for logstash, right?

And the last one:

So, if I'm using "file" input type I provide the path to log files with glob in it (path => "/etc/app/log/*") and instead of deleting the whole file content when truncated I just create another file with the timestamp in its name and continue logging in this newly created file. Then old filescan be deleted after some period of time. This technique will be loss-free, right?
I can use it with logstash-forwarder as well? Or there is some better way to do it?

Thank you for your help.
Best regards,
Yegor

By another backend with json format I mean that maybe I can use other storage type for my logs than a text file.

Maybe I can put my events from app directly to redis or sqlite or something else, and then pull them from logstash.

Oh, right. Sure, you can do that.

Is the file input is reliable enough to use it?

I would say so, yes.

So the initial json line may be wrapped into "message" field. Will logstash handle this right (if my assumption is correct)?

You'll probably need an extra json filter to unpack the inner JSON string.

Could you also say some words about "s3" input type?

Not really; I haven't used it myself.

So, if I'm using "file" input type I provide the path to log files with glob in it (path => "/etc/app/log/*") and instead of deleting the whole file content when truncated I just create another file with the timestamp in its name and continue logging in this newly created file. Then old filescan be deleted after some period of time. This technique will be loss-free, right?

Yes, as long as the file isn't deleted until Logstash has caught up.

I can use it with logstash-forwarder as well?

Sure.

1 Like

@magnusbaeck, I get confused between your answer above and the description of
the file input plugin document: File input plugin | Logstash Reference [8.11] | Elastic

The document implies that if I have an fast log rate, I may loss some event in general normal file rotation mechanism.

In summary of my question,
If my application generate log in speed of 100 events per second, and rotate my log via general file rotation mechanism, e. g., linux build-in logrotate or java log4j DailyRollingFileAppender, will logstash file input plugin cause event lost or not?

Thank you for your help.

If my application generate log in speed of 100 events per second, and rotate my log via general file rotation mechanism, e. g., linux build-in logrotate or java log4j DailyRollingFileAppender, will logstash file input plugin cause event lost?

I'm not quite sure except that it depends on the specifics.

  • Will the rotation be via renaming or copying/trucating? Or simply that the source program stops writing to the old file and creates a new file with (e.g.) today's date?
  • Which value of start_position?
  • Does the filename pattern cover rotated files?

This part of the file input documentation is insufficient and needs to be reviewed and extended by someone who understands the details of the Filewatch library.

Ok, given the following conditions:

  • Logs are rotated by rename, i.e., access.log=>access.log.1 have the same inode number.
  • start_position is default (end).
  • Filename pattern does not cover rotated files. (should I?) Follow the above case, the filename pattern is "access.log" (no "*")

Should I expect event loss under above criteria? Thanks.

The last time I looked at the source code I got the impression that there was a narrow race condition where Logstash could discover that access.log was gone before it had read the last of the file, but I've misunderstood that code before so don't take my word for it. Also, with start_position => end it will start tailing the new access.log so any log entries written to the new file before Logstash's discovery interval elapses and Logstash reopens the file are probably toast.

Is there any best practice about using file input plugin without event loss in general case? I can modify the log generator (log4j) to fit logstash's "feature".

Or should I use filebeat?

thanks