CSV filter, one line of data is missing

Hi,

first of all, I would like to say that I'm very new to elk stack and have some problems with understanding what I'm doing wrong.
I'm having a problem after Logstash indexing. Let's say that I have one log file with 1000 lines of data plus the first row for the column header.
My problem is that in Kibana when I visualize data for that log file, I see only 999 lines so one line of data is missing.

If you have any advice please help.

Those are my Log file and CSV filter.

log.csv

Number|Order Name|Timestamp|Duration|City|Customer Name|Flag
1|Order XY|1522980960678|22.033333|New York|Customer XY|Int
1|Order XY|1522980982679|22.033333|Paris|Customer XY|Int
2|Order XY|1522981005681|22.033333|Milano|Customer XY|Int
2|Order XY|1522981028681|22.033333|Berlin|Customer XY|Int
1|Order XY|1522981051679|22.033333|Paris|Customer XY|Int
1|Order XY|1522981074679|22.033333|London|Customer XY|Int
2|Order XY|1522981097678|22.033333|Washington|Customer XY|Int
2|Order XY|1522981120679|22.033333|Dubai|Customer XY|Int
1|Order XY|1522981143680|22.033333|Madrid|Customer XY|Int
1|Order XY|1522981166680|22.033333|Rome|Customer XY|Int
.
.
2|Order XY|1522981120679|22.033333|Dubai|Customer XY|Int
22|Order XY|1522981143680|22.033333|Madrid|Customer XY|Int
1001|Order XY|1522981166680|22.033333|Rome|Customer XY|Int

//

input {
    file {
        path => "/opt/logstash/elk_stack/csv_files/test.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
     }
}
filter {
    csv {

        separator => "|"

        columns => [ "Number", "Order Name", "Timestamp", "Duration", "City", "Customer Name", "Flag" ]
     }

     mutate {convert => ["Number", "integer"] }
     mutate {convert => ["Order Name", "string"] }
     mutate {convert => ["Duration", "integer"] }
     mutate {convert => ["City", "string"] }
     mutate {convert => ["Customer Name", "string"] }
     mutate {convert => ["Flag", "string"] }
     mutate {convert => ["Timestamp", "integer"] }
}
output {
    elasticsearch {
        hosts => "localhost"
        index => "test"
        document_type => "test"
     }
     stdout {}
}

First, document_type is deprecated in Elasticsearch; they have recommended using the special magical value of _doc to help your indices survive upgrade to Elasticsearch 7.0 when it comes out later this year.

Next, I would highly recommend supplying a document_id template, which would allow subsequent imports to intentionally overwrite the previous version of the event; if left unspecified, a new unique id is generated for each document so you'll end up with duplicates on subsequent imports.

Third, does your input have a trailing newline? If not, the input may be holding the last line in memory, waiting for more; it only emits new chunks to the codec when it encounters a newline character.

In a bash-compatible shell, the following will print a 1 to your console if the file has a trailing newline, and a 0 if it does not:

cat $yourfile | tail -n 1 | wc -l

If your file does not end with a newline, you can append one in-place with:

echo "" >> $yourfile

And finally, since you have both the source and the result, can you tell which line is missing? Is there anything special about this line?

1 Like

Hi, yaauie!

Thanks a lot for your help!

The problem was in trailing newline. After I append one in a file, entries in Kibana are just fine and correct. Have you advise what is the best way to automate this process of adding newline at the end of each log file which I get every day once and of course before indexing.

Regards

@resuhp after looking around, I failed to find any pre-made scripts to do just this -- most forums recommend using some variant of sed -i (Stream EDitor with the (In-place flag), but it doesn't actually edit the file in-place from an inode perspective, and since Logstash uses the inode to track where it left off on a given file, that could produce odd or unexpected effects.

The following script will append a newline to the given file in-place if and only if the file doesn't already end with a newline. It is Bash-based and uses only POSIX-compliant invocations of POSIX-standard tools available on most systems (wc, tail, and echo), so you should be able to use it just about anywhere.

1 Like

yaauie, thanks a lot for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.