CSV filter, one line of data is missing

resuhp · April 9, 2018, 9:10pm

Hi,

first of all, I would like to say that I'm very new to elk stack and have some problems with understanding what I'm doing wrong.
I'm having a problem after Logstash indexing. Let's say that I have one log file with 1000 lines of data plus the first row for the column header.
My problem is that in Kibana when I visualize data for that log file, I see only 999 lines so one line of data is missing.

If you have any advice please help.

Those are my Log file and CSV filter.

log.csv

Number|Order Name|Timestamp|Duration|City|Customer Name|Flag
1|Order XY|1522980960678|22.033333|New York|Customer XY|Int
1|Order XY|1522980982679|22.033333|Paris|Customer XY|Int
2|Order XY|1522981005681|22.033333|Milano|Customer XY|Int
2|Order XY|1522981028681|22.033333|Berlin|Customer XY|Int
1|Order XY|1522981051679|22.033333|Paris|Customer XY|Int
1|Order XY|1522981074679|22.033333|London|Customer XY|Int
2|Order XY|1522981097678|22.033333|Washington|Customer XY|Int
2|Order XY|1522981120679|22.033333|Dubai|Customer XY|Int
1|Order XY|1522981143680|22.033333|Madrid|Customer XY|Int
1|Order XY|1522981166680|22.033333|Rome|Customer XY|Int
.
.
2|Order XY|1522981120679|22.033333|Dubai|Customer XY|Int
22|Order XY|1522981143680|22.033333|Madrid|Customer XY|Int
1001|Order XY|1522981166680|22.033333|Rome|Customer XY|Int

//

input {
    file {
        path => "/opt/logstash/elk_stack/csv_files/test.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
     }
}
filter {
    csv {

        separator => "|"

        columns => [ "Number", "Order Name", "Timestamp", "Duration", "City", "Customer Name", "Flag" ]
     }

     mutate {convert => ["Number", "integer"] }
     mutate {convert => ["Order Name", "string"] }
     mutate {convert => ["Duration", "integer"] }
     mutate {convert => ["City", "string"] }
     mutate {convert => ["Customer Name", "string"] }
     mutate {convert => ["Flag", "string"] }
     mutate {convert => ["Timestamp", "integer"] }
}
output {
    elasticsearch {
        hosts => "localhost"
        index => "test"
        document_type => "test"
     }
     stdout {}
}

yaauie · April 9, 2018, 9:47pm

First, document_type is deprecated in Elasticsearch; they have recommended using the special magical value of _doc to help your indices survive upgrade to Elasticsearch 7.0 when it comes out later this year.

Next, I would highly recommend supplying a document_id template, which would allow subsequent imports to intentionally overwrite the previous version of the event; if left unspecified, a new unique id is generated for each document so you'll end up with duplicates on subsequent imports.

Third, does your input have a trailing newline? If not, the input may be holding the last line in memory, waiting for more; it only emits new chunks to the codec when it encounters a newline character.

In a bash-compatible shell, the following will print a 1 to your console if the file has a trailing newline, and a 0 if it does not:

cat $yourfile | tail -n 1 | wc -l

If your file does not end with a newline, you can append one in-place with:

echo "" >> $yourfile

And finally, since you have both the source and the result, can you tell which line is missing? Is there anything special about this line?

resuhp · April 12, 2018, 7:12am

Hi, yaauie!

Thanks a lot for your help!

The problem was in trailing newline. After I append one in a file, entries in Kibana are just fine and correct. Have you advise what is the best way to automate this process of adding newline at the end of each log file which I get every day once and of course before indexing.

Regards

yaauie · April 12, 2018, 7:08pm

@resuhp after looking around, I failed to find any pre-made scripts to do just this -- most forums recommend using some variant of sed -i (Stream EDitor with the (In-place flag), but it doesn't actually edit the file in-place from an inode perspective, and since Logstash uses the inode to track where it left off on a given file, that could produce odd or unexpected effects.

The following script will append a newline to the given file in-place if and only if the file doesn't already end with a newline. It is Bash-based and uses only POSIX-compliant invocations of POSIX-standard tools available on most systems (wc, tail, and echo), so you should be able to use it just about anywhere.

gist.github.com

https://gist.github.com/yaauie/59fae7c13972135fae7be6aad46ddb9e

ensure-trailing-newline.bash

#!/usr/bin/env bash -e
#
# Ensure Trailing Newline: ensures that the given plaintext file ends with a
# newline character, appending in-place only if it is missing.
#
# Portable on POSIX-based or POSIX-compatible systems, as it uses POSIX-standard
# invocations of `wc`, `tail`, and `echo`.
#
# Copyright 2018 Ry Biesemeyer
#

This file has been truncated. show original

resuhp · April 17, 2018, 7:10am

yaauie, thanks a lot for your help!

system · May 15, 2018, 7:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Misscounting lines when importing csv file to logstash Logstash	1	420	December 19, 2017
Basic logstash example for CSV Logstash	6	5969	May 23, 2017
Logstash seems to be working but no indices in Kibana Kibana	4	25	August 6, 2024
Parsing csv file through Logstash Logstash	18	2190	July 9, 2021
CSV not parsing logs Logstash	7	620	July 11, 2018

CSV filter, one line of data is missing

Related topics