Filter csv Logstash

Hi, everyone.
I am have same problem with add csv file to elastic. Could you kindly help me understand what is wrong.
My logstash config:

input {

file {
path => "C:\Saratov_report\ *.csv"
start_position => "beginning"
sincedb_path => "nul"
tags => ["report"]
}
}

filter {

if "report" in [tags] {
csv {
separator => ";"
columns => ["id", "availability", "address_of_the_object", "start_accident", "end_accident", "reason_of_service_affect", "alarm description", "responsibility", "provider_name", "TT", "classificator", "duration(Seconds)"]
}
}
}

output {
if "report" in [tags] {

elasticsearch {
hosts => "localhost:9200"
index => "report-%{+YYYY.MM.dd}"

    }
  }

}

Well in Kibana i dont see index "report" /

Exactly what problem are you having?

Well in Kibana i dont see index "report" /

Use forward slash, not backslash, in Windows file paths.

One second, i try to check

Great, i see the index "report". You was right! Thanks!
But now i have parser error, maybe somebody know, how is logstash work with russian languge (in the file, a lot of russian words).

[logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"1877;\xE4\xEE\xF1\xF2\xF3\xEF\xE5\xED \xEF\xEE \xF0\xE5\xE7\xE5\xF0\xE2\xF3;\"\xCE\xCE \"\"\xD0\xFF\xE7\xE0\xED\xFC \xB9 3\"\" \xE3. \xD0\xFF\xE7\xE0\xED\xFC, \xF3\xEB. \xCA\xEE\xEB\xFC\xF6\xEE\xE2\xE0, \xE4. 8_\";29.08.2017 1:08;29.08.2017 1:10;\xD1\xE1\xEE\xE9 \xF3 \xEF\xF0\xEE\xE2\xE0\xE9\xE4\xE5\xF0\xE0. \xCF\xF0\xEE\xE1\xEB\xE5\xEC\xFB \xED\xE0 \xF1\xE5\xF2\xE8 \xCE\xCF\xCC.;\xCF\xF0\xEE\xE1\xEB\xE5\xEC\xE0 \xED\xE0 \xEC\xE0\xF0\xF8\xF0\xF3\xF2\xE8\xE7\xE0\xF2\xEE\xF0\xE5 \xEF\xF0\xEE\xE2\xE0\xE9\xE4\xE5\xF0\xE0.;\xC1\xE8\xEB\xE0\xE9\xED;\xC1\xE8\xEB\xE0\xE9\xED;4051413;2;120;;\r", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

You probably need to add a codec to your file input and set the charset option. The default codec for the file input is plain, which is OK, you just need to tell it what encoding to expect for your file.

So if i correctly understad i need to add in area output iformation about codec, for example like this:

stdout { codec => rubydebug }

right?

No, I am saying you need to replace

file {
path => "C:\Saratov_report\ *.csv"
start_position => "beginning"
sincedb_path => "nul"
tags => ["report"]
}

with

file {
path => "C:\Saratov_report\ *.csv"
start_position => "beginning"
sincedb_path => "nul"
tags => ["report"]
codec => plain { charset => "?" }
}

I do not know what value you should have for charset. The default is UTF-8 and your encoding is not UTF-8. "Windows-1251" perhaps?

That would make the first few characters горстпд. I cannot read Cyrllic, so I have no idea whether that makes any sense.

Ou i understand, thank you for show me.

Information in the file looks like:
1;полностью недоступен;Офис в г. Краснодар, ул. Северная, д. 288_;20.11.2016 23:50;21.11.2016 0:30;Отсутствие электропитания на оборудовании банка.;"Оборудование Банка было обесточено.";БРС;БРС;3100;1;2400;;

2;полностью недоступен;Офис в г. Москва, ул. Фестивальная, д. 2А_;21.11.2016 10:40;21.11.2016 11:55;Отсутствие электропитания на оборудовании банка.;электропитание отключено арендодателем.;БРС;БРС;3100;1;4500;;

The structure columns have the folowing structure:

"id",
"availability",
"address_of_the_object",
"start_accident",
"end_accident",
"reason_of_service_affect",
"alarm description",
"responsibility",
"provider_name",
"TT",
"classificator",
"duration(Seconds)"

I think you are right about windows 1251.
Give me one moment i will check

Windows - 1251 gave same results, but its not enought,

I cheched the information in es, and saw that all columns is empty.
Is it mean that i need to add information about data type or no?

_index": "report-2019.03.27",
"_type": "doc",
"_id": "G7F4v2kBy37sTD7xcEd6",
"_version": 1,
"_score": null,
"_source": {
"path": "C:/Saratov_report/List_of_Alarm.csv",
"column2": null,
"column4": null,
"host": "srvocu01",
"column12": null,
"column1": null,
"@version": "1",
"column6": null,
"column9": null,
"message": ";;;;;;;;;;;;;\r",
"column8": null,
"column5": null,
"column11": null,
"@timestamp": "2019-03-27T14:05:34.820Z",
"column10": null,
"column3": null,
"column13": null,
"tags": [
"report"
],
"column14": null,
"column7": null
},
"fields": {
"@timestamp": [
"2019-03-27T14:05:34.820Z"

About codec i will try to find more information, now error looks like:

[WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"";├Б├и├л├а├й├н;├Б├и├л├а├й├н;4942115;1;157;;\r", :exception=>#<CSV::MalformedCSVError: Unclosed quoted field on line 1.>}
[2019-03-27T17:05:31,054][WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"5082;├д├о├▒├▓├│├п├е├н ├п├о ├░├е├з├е├░├в├│;├К├а├н├а├л ├в Call Center ├К├а├з├а├н├╝ - ├В├а├╡├и├▓├о├в├а, ├г. ├К├а├з├а├н├╝, ├│├л. ├В├а├╡├и├▓├о├в├а, ├д. 8_(ID 4084089);03.12.2018 3:16;03.12.2018 3:27;├С├б├о├й ├│ ├п├░├о├в├а├й├д├е├░├а. ├П├░├о├б├л├е├м├╗ ├н├а ├▒├е├▓├и ├О├П├М.;"├А├в├а├░├и├┐ ├н├а ├▒├е├▓├и ├п├░├о├в├а├й├д├е├░├а тАЬ├Б├и├л├а├й├нтАЭ. ", :exception=>#<CSV::MalformedCSVError: Unclosed quoted field on line 1.>}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.