Error parsing csv file // logstash

Hi,

I'm checking my logstash logs and see many entries with WARN "Error parsing csv" (80% cases are well parsed). ¿could u help me out? Many thanks in advance.

Furthermore, "date" field is not added to timestamp index.

Logstash conf file:

input {
file {
path => "/home/adminfran/desktop/monitoring/act_final3.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

    filter {
        csv {
			      columns => ["id tweet","date","author","text","app","id user","followers","following","stauses","location","urls","geolocation","name","description","url_media","type media","quoted","relation","replied_id","user replied","retweeted_id","user retweeted","quoted_id","user quoted","first HT","lang","created_at","verified","avatar","link"]
			      separator => "	"  #tab
			    }
		      date {
          match => ["date","yyyy-MM-dd HH:mm:ss"]
          timezone => "UTC"
          target => "date"
		      }
        mutate {
          remove_field => ["message"]
        }
	    }

output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "monitoring_twitter"
}
stdout{codec => rubydebug}
}

Error:
[WARN ] 2019-12-13 13:03:17.881 [[main]>worker1] csv - Error parsing csv {:field=>"message", :source=>"1146480753682898944\t2019-07-03 18:08:06\t@monterhusmore\tRT @Sertemus: Sagen sie mir, wenn ich falsch liege: hatten polizeibeamte, die @KRLS beobachteten, bereits einen gerichtsbeschluss? Wenn nicht, wer erteilt ihnen den auftrag, diese überwachung auf deutschem gebiet durchzuführen? @DerSPIEGEL @nytimes @guardian @BILD @lalsace @washingtonpost https://t.co/XsVtCC0FQi\tTwitter for Android\t2761881400\t2280\t4979\t200492\tNone\thttps://twitter.com/portet_bruguera/status/1146069575462637570\tNone\tMONTERHUSMORE.\tNone\tNone\tNone\tMHP @KRLS "El TJUE ha de resoldre, ha d'intervenir. Alguns ens voldrien silenciats, quiets i tancats. No hem callat, no ens aturarem i no ens rendirem. Deixeu-me dir ben clar: visca Europa i visca Catalunya Lliure" #PersisitimiGuanyarem https://t.co/5HsnXf6pGO\tquote\tNone\tNone\tNone\tNone\t1146069575462637570\tportet_bruguera\tNone\tde\t2014-09-05 10:39:56\tFalse\thttps://pbs.twimg.com/profile_images/1139420260917006336/dUbIjzwm_normal.jpg\thttps://twitter.com/monterhusmore/status/1146480753682898944", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

In a CSV file the entire field has to be quoted, and quotes within it are represented using double double quotes. You cannot have part of a field quoted.

So, I'm trying to transform field values in order to get rid of this character however I do not know to do it
I tried this:
mutate {
gsub => ["message", """, "''"]
remove_field => ["message"]
}
But It is not working. ¿Could you help me?

Hi

You have to escape the characters:

gsub => ["message","\"","\'"]

(maybe not the "'", give it a try)

and you should probably not remove "message" at this stage.

Hope this helps

That is happening on visualcode:

ERROR:

[ERROR] 2019-12-17 10:38:47.948 [Converge PipelineAction::Create] agent - Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of [ \t\r\n], "#", "{", ",", "]" at line 21, column 39 (byte 855) after filter {\n csv {\n\t\t\t\t columns => ["id tweet","date","author","text","app","id user","followers","following","stauses","location","urls","geolocation","name","description","url_media","type media","quoted","relation","replied_id","user replied","retweeted_id","user retweeted","quoted_id","user quoted","first HT","lang","created_at","verified","avatar","link"]\n\t\t\t\t separator => "\t" #tabulaciones\n\t\t\t\t }\n\t\t\t date {\n match => ["date","yyyy-MM-dd HH:mm:ss"]\n timezone => "UTC"\n target => "@timestamp"\n\t\t\t }\n \n mutate {\n gsub => ["message", "\""", :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:41:in compile_imperative'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:49:in compile_graph'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:11:in block in compile_sources'", "org/jruby/RubyArray.java:2584:in map'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:10:in compile_sources'", "org/logstash/execution/AbstractPipelineExt.java:156:in initialize'", "org/logstash/execution/JavaBasePipelineExt.java:47:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:26:in initialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:36:in execute'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:326:in block in converge_state'"]}

Hi

Try without escaping the single quote:

gsub => ["message","\"","'"]

Sometimes comments screw up. Try removing the "remove_field" line alltoghether.

It is not working

Sorry, It is working, but not well parsed again. Look this pic:

Hi

Could you share a line or two from your actual .csv file?

Could you remove all filters and share the output from stdout for those lines?

CSV fields:

"id tweet date author text app id user followers following stauses location urls geolocation name description url_media type media quoted relation replied_id user replied retweeted_id user retweeted quoted_id user quoted first HT lang created_at verified avatar link"

CSV lines:

"1202156044983689216 2019-12-04 09:21:50 @BCibada RT @omnium: 🔸 Avui @omniumintl engeguem una campanya a França, al Regne Unit i a Alemanya perquè els ciutadans europeus demanin als seus líders polítics que s'impliquin a trobar una solució política per Catalunya 📲 #ActForCatalonia https://t.co/Lg1sS7ZRcO Twitter for Android 882852012215369732 1173 2482 70875 https://www.omnium.cat/ca/omnium-interpella-massivament-els-principals-liders-europeus-perque-treballin-a-favor-duna-solucio-politica/ None Blad Cibada 🎗️ 🏳️‍🌈 💜 Defending Progressive views on Civil Rights, Politics and Economics. None None None RT None None 1202155081589760000 @omnium None None ActForCatalonia ca 2017-07-06 06:41:37 False https://pbs.twimg.com/profile_images/971692733818695681/eMFMibIr_normal.jpg https://twitter.com/BCibada/status/1202156044983689216"

"1202155864821567488 2019-12-04 09:21:07 @OmniumIntl 🇩🇪 Wir starten die Kampagne #ActForCatalonia! Wir bitten dabei die Bürger dieser Länder, sich mit ihren Volksvertretern in Verbindung zu setzen, damit diese dabei helfen, eine politische Lösung für Katalonien zu finden: https://t.co/hYXZMCPqub Twitter Web App 1049970228854226946 9350 791 1605 Barcelona, Catalonia https://www.omnium.cat/de/actforcatalonia/ None Òmnium International Freedom for the Catalan political prisoners. Catalonia deserves a political solution. None None None reply 1202155863286439936 @OmniumIntl None None None None ActForCatalonia de 2018-10-10 10:29:44 True https://pbs.twimg.com/profile_images/1050417217714757633/8R8m4bDO_normal.jpg https://twitter.com/OmniumIntl/status/1202155864821567488"

stdout:

Hi

The CSV lines you posted don't contain tabs, only spaces, but that might be due to pasting them here. In your "message", though, the separator is "\t".

Since the CSV filter is parsing your "message", try using "\t" as separator in your CSV filter, or maybe "\\t".

Besides, you have icons (non-character) in your csv (and in your "message"), and that might be giving you a hard time.

Hope this helps.

Using NotePad.


imagen

Thank u for your help. I appreciate.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.