How to prevent empty lines

npontes · May 28, 2018, 8:38am

Hello,

I'm currently using Logstash 2.1.1 to retrieve data from Elasticsearch 1.4.4 and put it on a CSV to be imported to a database. My problem is that there are lines where I only have commas (Like this: ",,,,,,,,,,").
Obviously when trying to import to a PostgreSQL it retrieves an error. is there any way for logstash to prevent this from happening?

My configuration file is the following:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

atira · May 28, 2018, 7:56pm

Drop all messages before the output that only contain commas?

filter {
    if [message] =~ '/^,+$/' {
        drop { }
    }
}

npontes · May 29, 2018, 7:11am

Hello,

I've tried to use that, but I had an error:

SyntaxError: (eval):54: syntax error, unexpected ','
              if (((event["[message]"] =~ //^,+$//))) # if [message] =~ '/^,+$/'

atira · May 29, 2018, 7:28am

Unexpected comma? Try to escape it.

if [message] =~ '/^\,+$/'

npontes · May 29, 2018, 7:31am

Different error:

SyntaxError: (eval):54: syntax error, unexpected null
              if (((event["[message]"] =~ //^\,+$//))) # if [message] =~ '/^\,+$/'

atira · May 29, 2018, 7:51am

You just gotta love how regex syntax differs from app to app.

I can't test it myself, so two other variants to try:

if [message] =~ /^,+$/
if [message] =~ '^,+$'

rcowart · May 29, 2018, 7:55am

You don't need the single quotes. I use this method all the time... if [message] =~ /^,+$/

For example...

if [log][message] =~ /^DHCPOFFER .*$/ {

npontes · May 29, 2018, 7:57am

I'm currently using the first suggestion @atira made without quotes like you said. It didn't return any error. I'll let it run and when it finishes I'll let you know if this configuration worked.

npontes · May 29, 2018, 9:42am

After the execution of the configuration I went to check the file and I still have some lines with nothing but commas. Do you have other suggestions?

npontes · May 29, 2018, 12:28pm

Just to clarify, right now my configuration is like this:

input {
	elasticsearch {
		hosts => "xxx"
		index => "xxx"
		scroll => "1m"
		
	}
}

filter {
    if [message] =~ /^,+$/ {
        drop { }
    }
}

output {
	csv {
		path => "output_Int_1thread_nofilter.csv"
		fields => ["xxx", "xxx","xxx","xxx", "xxx","xxx", "xxx", "xxx"]
	}
}

Despite this configuration I still have lines with only commas.

atira · May 29, 2018, 3:20pm

Ah okay. If your message field contains more than just commas, or no commas at all, then the conditional won't work.
All we know that the output produces lines with only commas. So the input produces an empty line? Though I wonder how.

If that's the case, modify the filter to this:

filter {
    if [message] =~ /^$/ {
        drop { }
    }
}

This should delete all empty messages that come from the input.

npontes · May 30, 2018, 8:40am

Hello @atira the solution you proposed didn't work

In a universe of more than 800k lines I still hava 34 that are just ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

atira · May 30, 2018, 9:31am

Uncharted territory for me, maybe someone else will look here too.

Brainstorming mode.
It'd be good to know what events the input filter creates exactly.
Can you configure eg. a file output? I presume that would simply write out the events to a file without any transformation. Then we would know what the csv filter receives.

guyboertje · May 30, 2018, 10:50am

This is what is happening under the covers...

bin/logstash -i irb
Sending Logstash's logs to /Users/guy/tmp/logstash-6.2.4/logs which is now configured via log4j2.properties
irb(main):001:0> event = LogStash::Event.new
=> #<LogStash::Event:0x6ed0a4aa>
irb(main):002:0> event.get("[foo]")
=> nil
irb(main):003:0> line = 5.times.map{ event.get("[foo]") }
=> [nil, nil, nil, nil, nil]
irb(main):004:0> require 'csv'
=> true
irb(main):005:0> line.to_csv
=> ",,,,\n"
irb(main):006:0>

You have some documents coming from Elasticsearch that are a different schema and do not have any of the fields that the CSV output needs.

npontes · May 30, 2018, 12:24pm

Right now I'm extracting everything I have on Index and then I'll try to find the positions where I have problems. Once I have any news I'll let you guys know

guyboertje · May 30, 2018, 2:40pm

You can use a conditional to check for the existence of important fields and drop in the else branch.

  if [field1] and [field2] and [fieldN] {
    # do transforms on the "good" docs
  } else {
    drop {}
  }

npontes · May 30, 2018, 3:50pm

I'll give that a go. But now I'm trying to analyse a 5GB text file to try to spot what might be causing this problem.

The solution that @atira proposed with :

if [message] =~ /^$/ {
        drop { }
}

made sense to me, and it didn't work.

Regarding what you proposed what transforms should I do?

guyboertje · May 30, 2018, 3:59pm

Any transforms you need can go there - its just a bit harder to do a negating if condition.

if !( [field1] and [field2] and [fieldN] ) {
  drop {}
}

The reason why Attila's suggestion does not work in this case, is because the event or document sourced from Elasticsearch already has all the fields.

Do you understand the "whats is happening under the covers" code block?

npontes · May 31, 2018, 9:46am

Yes, negating would work for me either because this index has more than 300 fields and I just want 40.

I think I understood. I'm just a little bit confused regarding the syntax you suggested.

It shouldbe like this:

if [xxx] and [xxx] and [xxx] {
    #can I do nothing here?
  } else {
    drop {}
  }

guyboertje · May 31, 2018, 10:20am

The syntax of the negated conditional?

Topic		Replies	Views
Logstahs behavior about empty lines Logstash	9	6649	September 28, 2017
Parsing csv and conditions Logstash	2	305	January 7, 2020
Logstash CSV String Text containing commas Logstash	1	1030	July 6, 2017
Special occurences Logstash	7	373	August 30, 2018
Csv parsing through logstash Logstash	4	2311	May 30, 2018

How to prevent empty lines

Related topics