Filebeat behavior


(fjemli) #1

I have an XML file which is gonna be updated arbitrary by another program by appending new documents each time...
This file also will be initialized every day by depopulating it.

I configured filebeat to catch every XML document inside this file matching this format <H_Ticket>...</H_Ticket> using this configuration:
> filebeat:

      # List of prospectors to fetch data.
      prospectors:
          paths:
            - C:\busesdata\*.xml
          input_type: log
          exclude_lines: ["^.*xml"]
          #ignore_older: 10s
          #close_older: 1h
          document_type: ticket
          scan_frequency: 15s
          multiline:
            pattern: '<H_Ticket'
            negate: true
            match: after
    output:
      ### Logstash as output
      logstash:
        hosts: ["localhost:5044"]
        index: filebeat

It works very well when adding many XML docs at the end of the file, but it sends an empty event when adding a single document, for example:

<H_Ticket>ticket1</H_Ticket> <H_Ticket>ticket2</H_Ticket>

=> it works properly

<H_Ticket>ticket</H_Ticket>

=> empty event

  • First, Is this behavior is due to a wrong multiline or other miss configuration or what?
  • Second, in my case, do I have to use ignore_older and close_older params to guarantee a smooth pipeline process or not? if yes how it might be set in my case?

Thank you in advance


(ruflin) #2

Do you have a new line at the end of the single event? I'm somehow surprised that an empty event is sent. Be aware that multiline.timeout: 5s will apply for the last event in a file as long as no new event is added.

Are the events appended to the file identical for single or combined events?

What do you mean by "initialized"? Is the same file truncated or deleted and a new one with the same name is created?


(fjemli) #3

Thank you

I mean by empty event an event generated by filebeat like this: {}
I don't know about multiline.timeout option, is this a new configuration option?

Yes, the events are identical, but always the first appended event is parsed as an empty event by filebeat when adding 1 or more events to the file.

I mean by initialized, that the same file will be totally depopulated.


(ruflin) #4

Is the full event just {} or the message? Because what should be always sent is for example the timestamp and some basic beat info.

Here you find all the docs for multiline and also timeout: https://www.elastic.co/guide/en/beats/filebeat/1.2/configuration-filebeat-options.html#multiline I thought timeout exists since multiline was introduced.

Sorry to ask again, but depopulated = truncated the file = remove all content inside the file?

Can you share 2 full events? That will make it easier to see if there is perhaps something wrong with multiline or exclude_lines. Did you ever remove exlude_lines and check if everything works as expected?


(fjemli) #5

Sorry, I'm an ES newbie, I mean by depopulated, that the program will remove all content inside the XML file.

Following is a sample content from the XML file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet href="ticket.xsl" type="text/xsl"?>
<HF_DOCUMENT>
	<H_Ticket>
		<IDH_Ticket>31</IDH_Ticket>
		<CodeBus>186</CodeBus>
		<CodeCh>5531</CodeCh>
		<CodeConv>5531</CodeConv>
		<Codeligne>12</Codeligne>
		<Date>20151217</Date>
		<Heur>1214</Heur>
		<NomFR1>SOUK AHAD</NomFR1>
		<NomFR2>CHOTT MERIEM </NomFR2>
		<Prix>0.8</Prix>
		<IDTicket>31</IDTicket>
		<CodeRoute>107</CodeRoute>
		<origine>01</origine>
		<Distination>09</Distination>
		<Num>1</Num>
		<Ligne>107</Ligne>
		<requisition> </requisition>
		<voyage>0</voyage>
		<faveur> </faveur>
	</H_Ticket>
	<H_Ticket>
		<IDH_Ticket>32</IDH_Ticket>
		<CodeBus>186</CodeBus>
		<CodeCh>5531</CodeCh>
		<CodeConv>5531</CodeConv>
		<Codeligne>12</Codeligne>
		<Date>20151217</Date>
		<Heur>1214</Heur>
		<NomFR1>SOUK AHAD</NomFR1>
		<NomFR2>SOVIVA </NomFR2>
		<Prix>0.66</Prix>
		<IDTicket>32</IDTicket>
		<CodeRoute>107</CodeRoute>
		<origine>01</origine>
		<Distination>07</Distination>
		<Num>2</Num>
		<Ligne>107</Ligne>
		<requisition> </requisition>
		<voyage>0</voyage>
		<faveur> </faveur>
	</H_Ticket>
	<H_Ticket>
		<IDH_Ticket>33</IDH_Ticket>
		<CodeBus>186</CodeBus>
		<CodeCh>5531</CodeCh>
		<CodeConv>5531</CodeConv>
		<Codeligne>12</Codeligne>
		<Date>20151217</Date>
		<Heur>1215</Heur>
		<NomFR1>SOUK AHAD</NomFR1>
		<NomFR2>KANTAOUI </NomFR2>
		<Prix>0.66</Prix>
		<IDTicket>33</IDTicket>
		<CodeRoute>107</CodeRoute>
		<origine>01</origine>
		<Distination>06</Distination>
		<Num>1</Num>
		<Ligne>107</Ligne>
		<requisition> </requisition>
		<voyage>0</voyage>
		<faveur> </faveur>
	</H_Ticket>
</HF_DOCUMENT>

(ruflin) #6

Sorry for the late reply. Just to be sure. You don't want the full event in one document which is between <HF_DOCUMENT but each sub entry in <H_Ticket>.... I assume every even starts like this, so the first three lines and last line should never be sent?


(David Li) #7

I hope your problem is solved, just a suggestion, can we have a more detailed subject line for question topic in the future? so people have a better chance finding what they need when searching, and they don't create duplicate topics.


(system) #8

This topic was automatically closed after 21 days. New replies are no longer allowed.