I have an XML file which is gonna be updated arbitrary by another program by appending new documents each time...
This file also will be initialized every day by depopulating it.
I configured filebeat to catch every XML document inside this file matching this format <H_Ticket>...</H_Ticket> using this configuration:
> filebeat:
# List of prospectors to fetch data.
prospectors:
paths:
- C:\busesdata\*.xml
input_type: log
exclude_lines: ["^.*xml"]
#ignore_older: 10s
#close_older: 1h
document_type: ticket
scan_frequency: 15s
multiline:
pattern: '<H_Ticket'
negate: true
match: after
output:
### Logstash as output
logstash:
hosts: ["localhost:5044"]
index: filebeat
It works very well when adding many XML docs at the end of the file, but it sends an empty event when adding a single document, for example:
First, Is this behavior is due to a wrong multiline or other miss configuration or what?
Second, in my case, do I have to use ignore_older and close_older params to guarantee a smooth pipeline process or not? if yes how it might be set in my case?
Do you have a new line at the end of the single event? I'm somehow surprised that an empty event is sent. Be aware that multiline.timeout: 5s will apply for the last event in a file as long as no new event is added.
Are the events appended to the file identical for single or combined events?
What do you mean by "initialized"? Is the same file truncated or deleted and a new one with the same name is created?
Sorry to ask again, but depopulated = truncated the file = remove all content inside the file?
Can you share 2 full events? That will make it easier to see if there is perhaps something wrong with multiline or exclude_lines. Did you ever remove exlude_lines and check if everything works as expected?
Sorry for the late reply. Just to be sure. You don't want the full event in one document which is between <HF_DOCUMENT but each sub entry in <H_Ticket>.... I assume every even starts like this, so the first three lines and last line should never be sent?
I hope your problem is solved, just a suggestion, can we have a more detailed subject line for question topic in the future? so people have a better chance finding what they need when searching, and they don't create duplicate topics.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.