Filebeat - Multiline: Ingest XML’s log file without end of last XML tag

I have logs file in XML format and I'm using Filebeat to collect these file and push it to Kafka topic. These XML files end without line feed, this filebeat multiline codec never forwards the last line of the XML to Kafka topic.

I'm using Filebeat 6.6.0

My XML's logs look like this

<employees>
    <employee id="111">
        <firstName>Manoj</firstName>
        <lastName>Sinha</lastName>
        <location>India</location>
    </employee>
    <employee id="222">
        <firstName>Alex</firstName>
        <lastName>Gussin</lastName>
        <location>Russia</location>
    </employee>
</employees>
<employees>
    <employee id="111-11">
        <firstName>Pihu</firstName>
        <lastName>Sinha</lastName>
        <location>India</location>
    </employee>
    <employee id="222-22">
        <firstName>Alex</firstName>
        <lastName>Max</lastName>
        <location>USA</location>
    </employee>
</employees>

My filebeat.yml looks like this

    filebeat.inputs:
    - type: log
      enabled: true
      paths:
        - C:\WorkSpace\Filebeat\logs\*.xml
        
      input_type: log
      document_type: xml
      encoding: UTF-8
        
      multiline.pattern: '^[[:space:]]|^</employees>'
      multiline.negate: false
      multiline.match: after
      
    #----------------------------- Kafka output --------------------------------
    output.kafka:
      hosts: ["localhost:9092"]
      topic: "Topic1"

Output at Kafka topic without end of last XML tag ()

{"@timestamp":"2019-06-03T13:33:21.002Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"beat":{"hostname":"LP-5CD812F3VC","version":"6.6.0","name":"LP-5CD812F3VC"},"host":{"name":"LP-5CD812F3VC"},"source":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml","offset":0,"log":{"file":{"path":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml"},"flags":["multiline"]},"message":"\u003cemployees\u003e\n    \u003cemployee id=\"111\"\u003e\n        \u003cfirstName\u003eManoj\u003c/firstName\u003e\n        \u003clastName\u003eSinha\u003c/lastName\u003e\n        \u003clocation\u003eIndia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"222\"\u003e\n        \u003cfirstName\u003eAlex\u003c/firstName\u003e\n        \u003clastName\u003eGussin\u003c/lastName\u003e\n        \u003clocation\u003eRussia\u003c/location\u003e\n    \u003c/employee\u003e\n\u003c/employees\u003e","prospector":{"type":"log"},"input":{"type":"log"}}
{"@timestamp":"2019-06-03T13:33:21.003Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"message":"\u003cemployees\u003e\n    \u003cemployee id=\"111-11\"\u003e\n        \u003cfirstName\u003ePihu\u003c/firstName\u003e\n        \u003clastName\u003eSinha\u003c/lastName\u003e\n        \u003clocation\u003eIndia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"222-22\"\u003e\n        \u003cfirstName\u003eAlex\u003c/firstName\u003e\n        \u003clastName\u003eMax\u003c/lastName\u003e\n        \u003clocation\u003eUSA\u003c/location\u003e\n    \u003c/employee\u003e","source":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml","offset":332,"log":{"flags":["multiline"],"file":{"path":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml"}},"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"LP-5CD812F3VC","hostname":"LP-5CD812F3VC","version":"6.6.0"},"host":{"name":"LP-5CD812F3VC"}}

I tried every possible combination but filebeat never forwards the last line of the XML tag to Kafka topic. When I manually added a space or next line to the end of the xml file, it works as I'm expecting but I can't add space or next line manually in the XML's log file.

@magnusbaeck , @andrewkroh , @pierhugues @ruflin Can you please help me on this quickly?

Regards, Manoj

As you already found out, the key is the newline char at the end as otherwise Filebeat will not detect it as a log line. Best would be to find a way to adjust the tool that writes your log file.

Thanks @ruflin for confirmation!!!

One more issue I'm facing, filebeat always returns Unicode character code instead of XML tag symbol under message. Even I tried on Linux as well but issue persist there as well. Also I tried every possible combination but still not able to figureout this issue.

"message":"\u003cemployees\u003e\n \u003cemployee id=\"111\"\u003e\n \u003cfirstName\u003eManoj\u003c/firstName\u003e\n \u003clastName\u003eSinha\u003c/lastName\u003e\n \u003clocation\u003eIndia\u003c/location\u003e\n \u003c/employee\u003e\n \u003cemployee id=\"222\"\u003e\n

{"@timestamp":"2019-06-03T13:33:21.002Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"beat":{"hostname":"LP-5CD812F3VC","version":"6.6.0","name":"LP-5CD812F3VC"},"host":{"name":"LP-5CD812F3VC"},"source":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml","offset":0,"log":{"file":{"path":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml"},"flags":["multiline"]},"message":"\u003cemployees\u003e\n    \u003cemployee id=\"111\"\u003e\n        \u003cfirstName\u003eManoj\u003c/firstName\u003e\n        \u003clastName\u003eSinha\u003c/lastName\u003e\n        \u003clocation\u003eIndia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"222\"\u003e\n        \u003cfirstName\u003eAlex\u003c/firstName\u003e\n        \u003clastName\u003eGussin\u003c/lastName\u003e\n        \u003clocation\u003eRussia\u003c/location\u003e\n    \u003c/employee\u003e\n\u003c/employees\u003e","prospector":{"type":"log"},"input":{"type":"log"}}
{"@timestamp":"2019-06-03T13:33:21.003Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.6.0","topic":"Topic1"},"message":"\u003cemployees\u003e\n    \u003cemployee id=\"111-11\"\u003e\n        \u003cfirstName\u003ePihu\u003c/firstName\u003e\n        \u003clastName\u003eSinha\u003c/lastName\u003e\n        \u003clocation\u003eIndia\u003c/location\u003e\n    \u003c/employee\u003e\n    \u003cemployee id=\"222-22\"\u003e\n        \u003cfirstName\u003eAlex\u003c/firstName\u003e\n        \u003clastName\u003eMax\u003c/lastName\u003e\n        \u003clocation\u003eUSA\u003c/location\u003e\n    \u003c/employee\u003e","source":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml","offset":332,"log":{"flags":["multiline"],"file":{"path":"C:\\WorkSpace\\Filebeat\\logs\\Test.xml"}},"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"LP-5CD812F3VC","hostname":"LP-5CD812F3VC","version":"6.6.0"},"host":{"name":"LP-5CD812F3VC"}}

Can you please have a look into this and help me to figure out this issue ?

Regards, Manoj

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.