Hello hello,
Quite new to Elastic Stack, and I'm encountering a problem which also experienced colleagues can't figure our the reason for this event happening.
Bottom line - my data loads fine, except that almost every time i update the csv, it loads in addition to the csv a broken document starting randomly from the middle of the last row. So every hour I get 0-3 broken documents as such:
The document is built of 50 fields, and expects and input of a csv with 50 columns, the first being the TaskID which is either a 6/7 len intger, or a string built 'TM-####'.
TaskID shouldn't load empty, or with anything but the TaskID column.
The broken documents are almost always from the last row in the csv or the one before it. There are no unique characters, it isn't always in the same place (randomly chooses a start character and builds the message from there).
sometimes it loads just an empty document, based on the last character of column 49 and column 50 of the last row, so taskID remains empty and message is just:
It shouldn't even be able to load without being 50 columns long.
The most common row (with out sensitive info): TM-478, 2018-11-25, 2020-06-10, , Continuation of TASK 77212 freeze up sporadically for few seconds, , , , , , ,Diamond - High, , ,Assigned to Support, ,Management Products CFG, , , , , ,0, ,6.0, 96.0, , ,Diamond Americas 1-3, ,568.0 ,568.0 ,Yes, Management, , , , , ,2018, 11, , , ,Management Products, , , , Security Management Products, 0
I have a basic CSV loader configured as such (removed company info):
input {
file {
path => "C:\CFS\ELK\task_raw.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => plain {
charset => "ISO-8859-1"
}
}
file {
path => "C:\CFS\ELK\jira_raw.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => plain {
charset => "ISO-8859-1"
}
}
}
filter {
csv {
separator => ","
columns => [50 columns]
}
date {
locale => "en"
match => ["CreateDate","YYYY-MM-dd"]
target => "@timestamp"
}
mutate {
remove_field => [ "column","column"]
convert => { "column1" => "float" }
convert => { "column2" => "integer" }
convert => { "column4" => "integer" }
convert => { "column5" => "integer" }
convert => { "column6" => "integer" }
convert => { "column7" => "integer" }
}
if "X" in [column] { drop{}}
}
output {
elasticsearch {
hosts => "localhost:9400"
manage_template => false
index => "taskraw_final_data_3"
document_type => "taskraw_final_data_3"
document_id => "id_%{TaskId}"
}
}
Any ideas on what might be the cause of the problem, or "workaround" solutions, would be highly appreciated!
Thank you,
Noam