Am looking at developing this - wrapping up a log parsing library to integrate into Filebeat.
The issue is around matching up start/end log entries, together with various intervening log entries for some commands (log format is a little idiosyncratic).
Optional will handle Perforce server structured logging (much more regular format, although still with start/end entries requiring matching) - but that's down the track.
Did consider creating a custom beat, but it seems like being a custom module will allow easier hooking in to the logic to read log files appropriately.
So planning a shared library which will parse a stream of log lines, and spit out json entries as appropriate. This library will be called by new module (as well as existing standalone analysis utils).
One challenge is that the module will maintain a list of current entries for which it has a start but not yet an end record. This list needs to be saved somewhere when the service is stopped, and then read again on startup as it restart with log processing. Not yet sure what is best way to do such save/restore of state.
Thoughts and ideas, pointers welcome!
So you would like to read, parse and forward the logs of Perforce, is it correct? Could you share a few example logs?
If I understood you correctly, you need to develop a Filebeat module.
Filebeat could read log lines for you and aggregate multiline messages into a single one. The messages could be forwarded to Elasticsearch which does the parsing for you. The progress of sending/reading logs are tracked by Filebeat, so there is no data duplication in the output.
Let me know if you need further help.
The following isn't behaving well with formatting...
Perforce server info: 2017/12/07 15:00:21 pid 148469 Fred@LONWS 10.40.16.14/10.40.48.29 [3DSMax/188.8.131.52] 'user-change -i' trigger swarm.changesave lapse .044s
Perforce server info: 2017/12/07 15:00:21 pid 148469 completed .413s 7+4us 0+584io 0+0net 4580k 0pf
Perforce server info: 2017/12/07 15:00:21 pid 148469 Fred@LONWS 10.40.16.14/10.40.48.29 [3DSMax/184.108.40.206] 'user-change -i' --- lapse .413s --- usage 7+4us 0+592io 0+0net 4580k 0pf --- rpc msgs/size in+out 3+5/0mb+0mb himarks 318788/2096452 snd/rcv .000s/.052s --- db.counters --- pages in+out+cached 6+3+2 --- locks read/write 0/2 rows get+pos+scan put+del 2+0+0 1+0
Rather like Auditbeat, would want the module to match up start and end records, extract other information where present (e.g. --- lines above) and return the fields. Then provide single cleaned up records to Elasticsearch.
There is usually a completed record with same pid (and other fields) which denotes end of record. But in some cases extra information is attached (known as track info) which records db lock info etc. The parsing is a bit tricky, but understood problem. The question is where in the process is best to do the matching of start/end records. This is where I assume a custom module is good.
Only downside I can see is that a custom module is then included whole. It is less standalone than a custom Beat.
Hmmm. I now realise that a filebeat module ships all lines off the a processor.
I probably do want to do a custom Beat then - which pre-processes the log and ships off pre-processed events.
Unless I have misunderstood
You could aggregate log lines into multiline events. I looked at the example logs and it seems like all last lines of all multiline events starts with
---. Is that correct?
If yes, you could configure a multiline pattern, so these logs are aggregated into the same event by Filebeat. Then you could do the processing in ES using the processors of the Ingest node: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest-processors.html
If not, Filebeat does not support multiple multiline configurations, so you might need to solve aggregating lines differently. But if I were you and it was possible I would stick with a Filebeat module, because it's requires less development, so it can be done more quickly.
Or is it a strict requirement to do preprocessing before sending the event to ES?
Unfortunately the format is not that regular.
I'm not worried about the log parsing, but am looking for ways to avoid re-inventing the wheel. So in a custom Beat, it would be great to be able to build on top of Harvester and similar libraries, e.g. take advantage of things like storing offset within file, detecting renames etc. But I am not sure I can sensibly just use those bits of functionality.
Alternatively to develop a new Beat, you could implementing a new processor. The processor would aggregate the events and flush them when the whole Perforce event has reached the processor. See more: https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.