How to prevent empty lines

Hello,

I'm currently using Logstash 2.1.1 to retrieve data from Elasticsearch 1.4.4 and put it on a CSV to be imported to a database. My problem is that there are lines where I only have commas (Like this: ",,,,,,,,,,").
Obviously when trying to import to a PostgreSQL it retrieves an error. is there any way for logstash to prevent this from happening?

My configuration file is the following:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

Drop all messages before the output that only contain commas?

filter {
    if [message] =~ '/^,+$/' {
        drop { }
    }
}

Hello,

I've tried to use that, but I had an error:

SyntaxError: (eval):54: syntax error, unexpected ','
              if (((event["[message]"] =~ //^,+$//))) # if [message] =~ '/^,+$/'

Unexpected comma? Try to escape it.

if [message] =~ '/^\,+$/'

Different error:

SyntaxError: (eval):54: syntax error, unexpected null
              if (((event["[message]"] =~ //^\,+$//))) # if [message] =~ '/^\,+$/'

You just gotta love how regex syntax differs from app to app.

I can't test it myself, so two other variants to try:

if [message] =~ /^,+$/
if [message] =~ '^,+$'

You don't need the single quotes. I use this method all the time... if [message] =~ /^,+$/

For example...

if [log][message] =~ /^DHCPOFFER .*$/ {

I'm currently using the first suggestion @atira made without quotes like you said. It didn't return any error. I'll let it run and when it finishes I'll let you know if this configuration worked.

After the execution of the configuration I went to check the file and I still have some lines with nothing but commas. Do you have other suggestions?

Just to clarify, right now my configuration is like this:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

filter {
    if [message] =~ /^,+$/ {
        drop { }
    }
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

Despite this configuration I still have lines with only commas.

Ah okay. If your message field contains more than just commas, or no commas at all, then the conditional won't work.
All we know that the output produces lines with only commas. So the input produces an empty line? Though I wonder how.

If that's the case, modify the filter to this:

filter {
    if [message] =~ /^$/ {
        drop { }
    }
}

This should delete all empty messages that come from the input.

Hello @atira the solution you proposed didn't work :frowning:

In a universe of more than 800k lines I still hava 34 that are just ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Uncharted territory for me, maybe someone else will look here too.

Brainstorming mode.
It'd be good to know what events the input filter creates exactly.
Can you configure eg. a file output? I presume that would simply write out the events to a file without any transformation. Then we would know what the csv filter receives.

This is what is happening under the covers...

bin/logstash -i irb
Sending Logstash's logs to /Users/guy/tmp/logstash-6.2.4/logs which is now configured via log4j2.properties
irb(main):001:0> event = LogStash::Event.new
=> #<LogStash::Event:0x6ed0a4aa>
irb(main):002:0> event.get("[foo]")
=> nil
irb(main):003:0> line = 5.times.map{ event.get("[foo]") }
=> [nil, nil, nil, nil, nil]
irb(main):004:0> require 'csv'
=> true
irb(main):005:0> line.to_csv
=> ",,,,\n"
irb(main):006:0>

You have some documents coming from Elasticsearch that are a different schema and do not have any of the fields that the CSV output needs.

Right now I'm extracting everything I have on Index and then I'll try to find the positions where I have problems. Once I have any news I'll let you guys know

You can use a conditional to check for the existence of important fields and drop in the else branch.

  if [field1] and [field2] and [fieldN] {
    # do transforms on the "good" docs
  } else {
    drop {}
  }

I'll give that a go. But now I'm trying to analyse a 5GB text file to try to spot what might be causing this problem.

The solution that @atira proposed with :

if [message] =~ /^$/ {
        drop { }
}

made sense to me, and it didn't work.

Regarding what you proposed what transforms should I do?

Any transforms you need can go there - its just a bit harder to do a negating if condition.

if !( [field1] and [field2] and [fieldN] ) {
  drop {}
}

The reason why Attila's suggestion does not work in this case, is because the event or document sourced from Elasticsearch already has all the fields.

Do you understand the "whats is happening under the covers" code block?

Yes, negating would work for me either because this index has more than 300 fields and I just want 40.

I think I understood. I'm just a little bit confused regarding the syntax you suggested.

It shouldbe like this:

if [xxx] and [xxx] and [xxx] {
    #can I do nothing here?
  } else {
    drop {}
  }

The syntax of the negated conditional?