Hello,
We are using Logstash for parsing csv data and load them into Postgresql and then after making proper transformation we move that data to Elasticsearch by using same Logstash . We don't have any problem about transferring data from Postgresql to Elasticsearch however , while reading csv there are some inconsistent things occurring.
I have a csv file with count of 65000. Csv file has us-ascii encoding. Logstash parse this file daily basis and store the data in Postgresql which has same model (columns) with csv. Generally Logstash reads that data without any problem but in some days Logstash reads much more less of that data and store them in Postgresql without giving any error. Like I said csv file has 65k lines (this value can change day to day) , Logstash reads like 20-30 (depends each day) of 65k data without giving any error.
I change logging level to Debug but I couldn't find anything. As a workaround ,after deleting sincedb , it reads all data successfully and store them into DB.
This happens randomly , there is no specific pattern for that occurrences.
Example of sampleData.csv:
COLUMN1,COLUMN2,COLUMN3
DATA1,DATA2,DATA3
sampleFile.conf
input {
file {
path => "/app/data/sampleData*.csv"
start_position => "beginning"
add_field => { "[@metadata][appname]" => "sampleData" }
sincedb_path => "/var/lib/logstash/data/last/.sampleData"
codec => plain {
charset => "ISO-8859-1"
}
}
}
filter {
if [@metadata][appname] == "sampleData" {
csv {
separator => ";"
columns => ["column1","column2","column3"]
}
if [column1] == "COLUMN1" { drop {} }
mutate {
remove_field => ["@version", "@timestamp", "message", "path", "host"]
}
}
}
output {
if [@metadata][appname] == "sampleData"
{
jdbc {
connection_string => "jdbc:postgresql://X.X.X.X:5432/sampleData"
username => "username"
password => "password"
max_pool_size => "3"
statement => [ 'INSERT INTO schema.sampleData ("column1","column2","column3") VALUES (?, ?, ?)', "column1","column2","column3"]
}
}
}
Things I've tried so far:
- Changing encoding option of sampleFile.conf.
- Removing codec part of sampleFile.conf.
- Looking whether csv file has any special char or unformatted data.
- Looking the EOL marker of csv file.
I've not found any proper solution yet . I wonder if anyone has come across with same situation ? Could you please suggest any advice ?
Thanks