Skip first few lines in file


#1

Is it possible to to skip (ie. not send them to logstash) lines in a log file?

The log I'm working with has 4 lines at the top which I would like to ignore and not send to logstash. Is this possible?


(Joshua Rich) #2

You could match the lines with a regexp then use the drop filter. So say the lines are comments starting with # then this filter snippet will remove them:

if [message] =~ /^#/ {
  drop { }
}

#3

Hi Joshua,

Thanks for the tip.

Unfortunately each line is different. Will I need 4 different ifs or can I just use an OR and put in all my conditions?

In terms of the syntax, does it matter if this goes before or after the "if" grok?

I have:

filter {
if [type] == "mylogtype" {
grok {....}
}
}

P.S. How do you get that pretty looking code in your post?


(Joshua Rich) #4

Hey @tweetybird,

Yep, sounds like you'll need a few OR's, so something like:

if [message] =~ /^match1/ or [message] =~ /^match2/ ...

This should go before your grok, all filters and conditional blocks are evaluated in the order they appear.

To get the pretty formatting, use three ` on a single line at the beginning and end of your code block.


#5

Hi Joshua,

That seems to have done the trick. The very first line in the file is still getting picked up but it seems to be related to en encoding problem as it's some UTF-16 marker it seems.

In the logstash logs I see the first character on the first line is showing as <U+FEFF>

From Google it seems like that might be utf-16be but when I tried that, filebeat didn't seem to work correctly (this is on windows) .

What I have now in the filebeat.yml is utf-16le. Before it was set to plain and logstash logs had all kinds of \0000\0004 type things in the logs for the message.

I also tried utf-16bom-be as shown in the comments of the yml but that didn't seem to work either.

Is there a quick fix that I'm missing for what seems to be an minor encoding problem?

[edited for clarity]


#6

Just to clarify a little better:

with utf-16be, it detects the logs files but doesn't detect any changes in it (filebeat logs show zero changes)

with utf-16be-bom, I see the following in the filebeat log:
ERR Error initializing harvester: unknown encoding('utf-16be-bom'

If I leave the encoding at plain, the lines are shipped to logstash but most of them have a tags field that says _grokparsefailure

When I open the file in notepad++, the little icon at the bottom says UCS-2 Little Endian which is why I treid utf-16le and except for the first character of the first line, everything seems to work.

Would be nice to get that first line figured out...


(Joshua Rich) #7

Hmm, it's possible you are hitting this bug. Fix is merged and coming in filebeat 1.1.


(ruflin) #8

I the next version of filebeat (1.1) you could the same directly on the filebeat side with exclude_lines: https://github.com/elastic/beats/pull/430


(Steffen Siering) #9

General problem with utf-16 is the encoding, big endian or little endian. That's why the bom (Byte Order Marker) was introduced, for processors to detect the endianness. If not given the default endianness is supposed to be big endian, but microsoft decided otherwise. That is by default on windows systems generating utf-16 you will mostly have to deal with utf-16le. Unfortunately bom is a little tricky to read if file is empty for so many seconds after creation, but with 1.1 we introduced the encodings utf-16be-bom, utf-16le-bom and utf-16-bom.


#10

Sounds like the best thing to do is wait for v1.1 and see if it helps. Would be easier to simply tell filebeat to ignore the first x number of lines but I guess the upcoming regex solution works too :slight_smile:


(ruflin) #11

The 1.1.0 snapshots are already here available and should be quite stable: https://beats-nightlies.s3.amazonaws.com/index.html?prefix=filebeat/


(system) #12