Regarding Filebeat Ouput to a File in Remote Server or NFS

Hi,

I've tried the follwing

filebeat --> logstash --> NFS -> logstash --> Elastic search.

But after file beat is writing to the file in the format

"message":"2016/03/23 06:34:10 [INFO] /RPSW1: Adding spanky(DEVICE)","@version":"1","@timestamp":"2016-04-25T15:11:20.290Z","beat":{"hostname":"Ericsson","name":"Ericsson"},"count":1,"fields":null,"input_type":"log","offset":0,"source":"/var/log/RPSW1_20160323063413.log","type":"log","host":"Ericsson","tags":["beats_input_codec_plain_applied"]}

when we store filebeat data to logstash into a file & again pass file data to logstash into elasticsearch, all the fields will be merged into one field which won’t serve our purpose.

Reading the data from the file using the second logstash instance is not sending the lines correctly . I tried to use json_lines as codec in logstash.conf and tried to read the filebeat data written to the file.

it's quite hard to follow you. There are quite some parts in your setup.

  1. What's the exact config per filebeat and logstash instances?
  2. How does output look like in between every step?
  3. instead of sending to elasticsearch for testing use stdout on last logstash instance.

Hi Steffens,

There was some typo in the previous mail.

I'm giving some sample format

Format in which Filebeat write to the File:
"message":"2016/03/23 06:34:10 [INFO] /RPSW1: Adding spanky(DEVICE)","@version":"1","@timestamp":"2016-04-25T15:11:20.290Z","beat":{"hostname":"Ericsson","name":"Ericsson"},"count":1,"fields":null,"input_type":"log","offset":0,"source":"/var/log/RPSW1_20160323063413.log","type":"log","host":"Ericsson","tags":["beats_input_codec_plain_applied"]}

Format in which logstash read the data from the file:

Logstash is taking the message as a single field.

It's taking a the line as a single filed and sending to elastic search . So our elastic search queries are not working.

Is there any difference in the way filebeat sending data to logstash directly and filebeat writing the file.
and then logstash read the filebeat data from the file.

What's the exact config per filebeat and logstash instances?
Filebeat output is configured as a file .
Logstash is taking input from the file and forwarding it to elastic search .

How does output look like in between every step?

Logstash output to file:
{"message":"root 19712 19709 0 02:08 ttyS0 00:00:00 grep dtpd","@timestamp":"2016-05-02T13:36:30.654Z","offset":366,"source":"/var/log/broncos_logs/dvt_1.txt","chs":"test","date":"02-05-2016"}

Logstash output to elasticsearch:
"message": "{"message":"root 19712 19709 0 02:08 ttyS0 00:00:00 grep dtpd","@timestamp":"2016-05-02T13:36:30.654Z","offset":366,"source":"/var/log/broncos_logs/dvt_1.txt","chs":"test","date":"02-05-2016"}"

instead of sending to elasticsearch for testing use stdout on last logstash instance.

We hvae checked the output in logstance instance it is showing as in the screen shot given in the previous reply.

The last Logstash instance which, as I understand it, reads a file with

{"message":"root 19712 19709 0 02:08 ttyS0 00:00:00 grep dtpd","@timestamp":"2016-05-02T13:36:30.654Z","offset":366,"source":"/var/log/broncos_logs/dvt_1.txt","chs":"test","date":"02-05-2016"}

needs to use the json codec in its file input.

the log-output is pretty funny. it's a JSON event embedded into another JSON event.

If filebeat is writing to file, the content is JSON. First logstash instance reading from file written by filebeat (why is this instance even required?) needs json codec in order to parse the event.

Yes , we tried using json code in input from file to second Ls instance but it's not working. As mentioned earlier we are trying to read from the file in NFS mount due to firewall issues. That is why we are using two instance.

But currently we have not tested reading from NFS mount file . We have tested reading from a local log file.

Yes , we tried using json code in input from file to second Ls instance but it's not working.

Exactly what did you try? What happened? Was there something in the logs? Can you provide a minimal example that exhibits the incorrect behavior?

@vivc please provide input/output configs of filebeat and logstash instances. If warning/errors are printed by filebeat/logstash please include. To be honest, we've a hard time to even understand what exactly you are doing.

I still don't understand why you need filebeat->logstash->NFS instead of filebeat->NFS.

Sorry for that . Actually we are reading the data using filebeat and sending to logstash applying filters so that we get real data parsing time and sending it to file. Then the other logstash even which is installed in another server seperated by firewall will read the data from NFs file and store in elastic search.

But the message written to file is shown as JSON event embedded into another JSON event. We have mentioned codec as json while reading the data from file to logstash. But still it's reading as a single field and sending to elastic search.

Currently we are able to resolve it by applying the filters on the filebeat read data and getting the original message and again applying our filters on top of that .

I'm not sure if you followed me , Thanks for your support.

can you share config files?

Sorry for the delay for response.

After pumping data from filebeat to file. The following grok pattern was used to retrieve the exact message.

grok {
	break_on_match 	=> false
	match => { "message"	=> [ "\"message\":\"(?<my_msg>.*)\"\,\"@timestamp\":\"(?<timestamp>.*)\"\,\"chassis\":\"(?<chs>.*)\"\,\"offset\":(?<offset>.*)\,\"source\":\"(?<source>.*)\""	] 
	}
}

if [my_msg] {
    mutate {
        replace => [ "message", "%{my_msg}" ]
		remove_field => [ "my_msg" ]
    }
}

Then the data is filtered correctly.

Seriously, don't do this. Super fragile approach. filebeat stores data in json format. Use json filter to parse the filebeat output. Using grok on raw-json (without decoding first) might give you some headaches laters on, due to json potentially escaping strings. e.g. having a " in your original message might break your grok-pattern. Plus, json does not demand field order, but grok (basically being regexes on steroids) does. I'm a little baffled this works on filebeat restarts. I'd expect field order randomized between different runs.

Thanks for the information . We tried with json filter to parse the filebeat output.But that didn't work as i mentioned earlier. May we'll try it again and inform you .

Dear Magnus,

could you please elaborate here?

That said, networked file systems are notorious for having different
edge case behavior and Logstash is usually not used for reading files
from NFS.

We use a setup in which logstash consumes data from NFS share, therefore, would be very grateful for any further points on this.

Thanks
Joao

Here are some notes for networks shares: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-network-volumes.html The main problem for filebeat is that file identifiers can change even though it stays the same file. In addition sometimes file meta data is cached and not immidiately updated. It is recommended to install filebeat on each node.