Multiline read in filebeat


(Aditya Srivastava) #1

I have a custom log file which has multiple lines getting logged to it. All those multiple lines do not have any similar pattern to it. Those multilines are random. Which or rather what kind of parser or anything should I write in filebeat conf file. How do I read those multi lines??
Log file ex.

Nov 17 16:25:30 1.2.3.4 appData [app:16:25:28,115] INFO  [application level detail. Got response
Nov 17 16:25:30 1.2.3.4 appData [app:16:25:28,115] DEBUG [application-level detail. Response from app is :: {}029B<?xml version="1.0" encoding="UTF-8" ?>
                        
       <Engine>
                <Header>
                        <Version>1.0</Version>
                        <App>ABC</App>
                        <TargetApp>DEF</TargetApp>
                        <Count>161117162528</MsgId>
                        <TimeStamp>2016-11-17T16:25:32.313+05:30</TimeStamp>
                </Header>
                <Body>
                        <AuthRes>
                                <Msgcount>0810</Msgcount>
                                <countDate>20161117</countDate>
                        </AuthRes>
                </Body>
        </Engine>
Nov 17 16:25:30 1.2.3.4 app [App:16:25:28,116] INFO  [application level data]

(Andrew Kroh) #2

I think you could you the timestamp to identify where new log lines start. See the Timestamp example in the documentation.


(Aditya Srivastava) #3

Are you talking about this timestamp?? If yes what about the other entries..How do I get all of them as one message???


(Andrew Kroh) #4

No, I'm talking about the timestamp from the logger.


(Steffen Siering) #5

Have a lookt at your logs. They are mostly plain-text starting with month, day of month and time. The multiline content is always indented. e.g. check a log-line starting with any of these characters ^[JFMASOND] (regex captures all characters of "months"). Alternative check for log line being empty or starting with space/tab.


(Aditya Srivastava) #6

Thnk u so much for the reply....Now I get the desired output all the multiline as one message... Only Prob now is all my new line is coming as \n and all the spaces are coming as \t.

ex-:
{"@timestamp":"2016-11-17T14:09:33.670Z","beat":{"hostname":"ip-10-0-0-9","name":"ip-10-0-0-9","version":"5.0.0"},"input_type":"log","message":"Nov 17 19:39:32 RCPPPCFWASN1 Wallet_App__access [WALLET:19:39:28,126] DEBUG [application-akka.actor.default-dispatcher-35959][SwitchPaymentResponseActor.java:45] Response from switch is :: {}029B\u003c?xml version="1.0" encoding="UTF-8" ?\u003e\n\t\u003cEngine\u003c/Header\u003e\n\t\t\u003cBody\u003e\n\t\t\t\u003cAuthRes\u003e\n\t\t\t\t\u003cMsgType\u003e0810\u003c/MsgType\u003e\n\t\t\t\t\u003cTranDate\u003e20161117\u003cmber\u003e\n\t\t\t\t\u003countDate\u003e20161117\u003\n\t\t\t\u003c/AuthRes\u003e\n\t\t\u003c/Body\u003e\n\t\u003c/Engine\u003e","offset":30024253,"source":"/res/1.2.3.4/App/2016-11-17.log",}


(Stefan Thies) #7

You could use @sematext/logagent instead of filebeat, logstash etc.
It is lightweight log shipper made with nodejs.

It can parse your file out of the box:

sudo npm i -g @sematext/logagent
cat test.log | logagent --yaml 
@timestamp: Thu Nov 17 2016 15:21:47 GMT+0100 (CET)
message:    Nov 17 16:25:30 1.2.3.4 appData [app:16:25:28,115] INFO  [application level detail. Got response
logSource:  unknown

@timestamp: Thu Nov 17 2016 15:21:47 GMT+0100 (CET)
message: 
  """
    Nov 17 16:25:30 1.2.3.4 appData [app:16:25:28,115] DEBUG [application-level detail. Response from app is :: {}029B<?xml version="1.0" encoding="UTF-8" ?>
                            
           <Engine>
                    <Header>
                            <Version>1.0</Version>
                            <App>ABC</App>
                            <TargetApp>DEF</TargetApp>
                            <Count>161117162528</MsgId>
                            <TimeStamp>2016-11-17T16:25:32.313+05:30</TimeStamp>
                    </Header>
                    <Body>
                            <AuthRes>
                                    <Msgcount>0810</Msgcount>
                                    <countDate>20161117</countDate>
                            </AuthRes>
                    </Body>
            </Engine>
  """
logSource:  unknown

@timestamp: Thu Nov 17 2016 15:21:48 GMT+0100 (CET)
message:    Nov 17 16:25:30 1.2.3.4 app [App:16:25:28,116] INFO  [application level data]
logSource:  unknown

To ship logs to elasticsearch use

logagent -e http://localhost:9200 -i logs /var/log/*.log

More info here: https://sematext.com/logagent/


(Aditya Srivastava) #8

thnk u sir..will look into it....We also use logstash for filtering the data and parsing it.. Can that be done using the logagent????


(Stefan Thies) #9

Yes. You can define custom patters to structure logs, apply JavaScript functions or SQL queries to it before you ship the structured and aggregated logs to Elasticsearch.

Various log formats are supported out the box (nginx, system logs, mongodb, elasticsearch, hadoop, kafka ...).
Example pattern: http://sematext.github.io/logagent-js/parser/


(Aditya Srivastava) #10

how do we use kafka and logagent?? can we take the discussion to a new thread...


(Stefan Thies) #11

We could create a Kafka output plugin, but you might not need it, because Logagent has a disk buffer (used when connection to Elasticsearch fails), and retransmits logs when Elasticsearch is available again.

You can ask questions in Github https://github.com/sematext/logagent-js or the new forum: https://groups.google.com/forum/#!forum/logagent


(Aditya Srivastava) #12

Hii....I tried with the solution u gave . The pattern is working as expected when I try the pattern and content on the play golang. But When I try this on my file beat config it shows \n for newlines and \t for space. Please help me out on this.


(Stefan Thies) #13

Sorry, can't help with filebeat mixing up white spaces :frowning:

Logagent pattern definition would look like this:

patterns:
 - # multiline with blockStart regex
  sourceName: !!js/regexp /mylogs/
  blockStart: !!js/regexp /^\S+\s\d+\s\d\d:\d\d:\d\d/
  match:
    - type: mylogs
      regex: !!js/regexp /^(\S+\s\d+\s\d\d:\d\d:\d\d)\s(\S+)\s(\S+)\s\[.+\]\s(\S+)\s([\S|\s]+)/
      fields:
        - ts
        - ip_address:string
        - app:string
        - severity:string
        - message:string
      dateFormat: MMM D HH:mm:ss

You could also extract fieds from the XML inside, with a bit more work on the regex.


(Andrew Kroh) #14

This is normal. The data is put into a JSON object and since newlines and tab characters are special characters they need to be encoded. You'll also notice that quotes in the message are also escaped.

What problem is this causing you?


(Aditya Srivastava) #15

yeah u were right..The end message I get at ES is in proper format except for there is no new line. All message gets appended in same line with space. But still I can work with it. Thank


(Steffen Siering) #16

makes me wonder how you display your message. You display the message via browser, e.g. are you using kibana? Newline characters are included in stored string (the json encoding uses \n for newlines). But when displaying in browser via HTML as is (without transforming \n to <br> or using <pre> tag), newlines will be ignored by HTML.


(system) #17

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.