How to prevent empty lines


#1

Hello,

I'm currently using Logstash 2.1.1 to retrieve data from Elasticsearch 1.4.4 and put it on a CSV to be imported to a database. My problem is that there are lines where I only have commas (Like this: ",,,,,,,,,,").
Obviously when trying to import to a PostgreSQL it retrieves an error. is there any way for logstash to prevent this from happening?

My configuration file is the following:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

(Attila Boncok) #2

Drop all messages before the output that only contain commas?

filter {
    if [message] =~ '/^,+$/' {
        drop { }
    }
}

#3

Hello,

I've tried to use that, but I had an error:

SyntaxError: (eval):54: syntax error, unexpected ','
              if (((event["[message]"] =~ //^,+$//))) # if [message] =~ '/^,+$/'

(Attila Boncok) #4

Unexpected comma? Try to escape it.

if [message] =~ '/^\,+$/'


#5

Different error:

SyntaxError: (eval):54: syntax error, unexpected null
              if (((event["[message]"] =~ //^\,+$//))) # if [message] =~ '/^\,+$/'

(Attila Boncok) #6

You just gotta love how regex syntax differs from app to app.

I can't test it myself, so two other variants to try:

if [message] =~ /^,+$/
if [message] =~ '^,+$'


(Robert Cowart) #7

You don't need the single quotes. I use this method all the time... if [message] =~ /^,+$/

For example...

if [log][message] =~ /^DHCPOFFER .*$/ {

#8

I'm currently using the first suggestion @atira made without quotes like you said. It didn't return any error. I'll let it run and when it finishes I'll let you know if this configuration worked.


#9

After the execution of the configuration I went to check the file and I still have some lines with nothing but commas. Do you have other suggestions?


#10

Just to clarify, right now my configuration is like this:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

filter {
    if [message] =~ /^,+$/ {
        drop { }
    }
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

Despite this configuration I still have lines with only commas.


(Attila Boncok) #11

Ah okay. If your message field contains more than just commas, or no commas at all, then the conditional won't work.
All we know that the output produces lines with only commas. So the input produces an empty line? Though I wonder how.

If that's the case, modify the filter to this:

filter {
    if [message] =~ /^$/ {
        drop { }
    }
}

This should delete all empty messages that come from the input.


#12

Hello @atira the solution you proposed didn't work :frowning:

In a universe of more than 800k lines I still hava 34 that are just ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


(Attila Boncok) #13

Uncharted territory for me, maybe someone else will look here too.

Brainstorming mode.
It'd be good to know what events the input filter creates exactly.
Can you configure eg. a file output? I presume that would simply write out the events to a file without any transformation. Then we would know what the csv filter receives.


(Guy Boertje) #14

This is what is happening under the covers...

bin/logstash -i irb
Sending Logstash's logs to /Users/guy/tmp/logstash-6.2.4/logs which is now configured via log4j2.properties
irb(main):001:0> event = LogStash::Event.new
=> #<LogStash::Event:0x6ed0a4aa>
irb(main):002:0> event.get("[foo]")
=> nil
irb(main):003:0> line = 5.times.map{ event.get("[foo]") }
=> [nil, nil, nil, nil, nil]
irb(main):004:0> require 'csv'
=> true
irb(main):005:0> line.to_csv
=> ",,,,\n"
irb(main):006:0>

You have some documents coming from Elasticsearch that are a different schema and do not have any of the fields that the CSV output needs.


#15

Right now I'm extracting everything I have on Index and then I'll try to find the positions where I have problems. Once I have any news I'll let you guys know


(Guy Boertje) #16

You can use a conditional to check for the existence of important fields and drop in the else branch.

  if [field1] and [field2] and [fieldN] {
    # do transforms on the "good" docs
  } else {
    drop {}
  }

#17

I'll give that a go. But now I'm trying to analyse a 5GB text file to try to spot what might be causing this problem.

The solution that @atira proposed with :

if [message] =~ /^$/ {
        drop { }
}

made sense to me, and it didn't work.

Regarding what you proposed what transforms should I do?


(Guy Boertje) #18

Any transforms you need can go there - its just a bit harder to do a negating if condition.

if !( [field1] and [field2] and [fieldN] ) {
  drop {}
}

The reason why Attila's suggestion does not work in this case, is because the event or document sourced from Elasticsearch already has all the fields.

Do you understand the "whats is happening under the covers" code block?


#19

Yes, negating would work for me either because this index has more than 300 fields and I just want 40.

I think I understood. I'm just a little bit confused regarding the syntax you suggested.

It shouldbe like this:

if [xxx] and [xxx] and [xxx] {
    #can I do nothing here?
  } else {
    drop {}
  }

(Guy Boertje) #20

The syntax of the negated conditional?