Update CSV file changes to ELK

kb2295 · December 17, 2018, 10:20pm

Hello,

If I have a CSV file that adds additional rows every 24 hours, is there a way that I can have ELK upload only the additional rows? I need a system in which I can keep changing the CSV file and not have ELK upload everything in that file from start.

Thank you!

yaauie · December 18, 2018, 12:42am

are the new rows being appended to the file, or is the file being replaced with a new file that has more lines?

If the former, then the file input plugin should work for you just fine without any special configuration; it keeps track of the file, its inode, and how far it has read in order to avoid re-emitting lines that it has already processed.

kb2295 · December 18, 2018, 5:23pm

The file is being replaced with another file with more lines in it.

I don't understand what you mean by the former option. If I edit a CSV using the vim command line and add rows to it. Isn't that the same as the entire file being replaced?

Thanks

kb2295 · December 19, 2018, 2:25pm

How can I test the former method out with the file input plugin?

yaauie · December 19, 2018, 5:09pm

It depends on how vim is configured; this answer may be helpful.

You can concatenate data onto the tail of a file's existing inode on the command-line using the double shovel operator (>>); suppose you have a data.csv containing all of your existing data, and a new-lines.csv that contains some new lines:

cat new-lines.csv >> data.csv

caveat: in the above secenario, both files must have a trailing newline in order to work repeatably.

When testing new things, I tend to use the stdout output plugin using the rubydebug codec (which splats out all of the fields in an event in a somewhat human-readable format), which enables me to see what Logstash is doing.

The following pipeline configuration uses the File Input to read lines from a file, the CSV Filter to extract the CSV data, and an STDOUT Output with a RubyDebug Codec to output events to stdout:

input {
  file {
    path => "/path/to/your/data.csv"
    start_position => "beginning"
  }
}

filter {
  csv {
    # ...
  }
  # add other filters here to mutate your data
}

output {
  stdout {
    codec => rubydebug
  }
}

Then I would start up Logstash and leave it running in a screen session or a separate console tab; at this point I should observe it process the existing lines and wait for more. Then, I would append lines to the data.csv using the method I wanted to test and observe whether Logstash processed only the new lines or if it re-processed the old lines.

Warning: the File input keeps track of where it left off across restarts using a persistent checkpoint file called a sincedb (relevant docs here)

kb2295 · December 19, 2018, 7:14pm

Thank you so much.
If the file is being replaced/overwritten, is there a way I can achieve the same results?

elasticforme · December 19, 2018, 8:49pm

if there is no solution by logstash you can do something like this.

save your last file with data.csv.old1 and then when you get new file data.csv delete all the record present in old file.

use sed/awk etc.. to remove everything from new csv file (which is present in old file). and leftover file is just new entry

yaauie · December 20, 2018, 12:42am

It depends. If you are using an output that is capable of deduping on id (e.g., Elasticsearch, JDBC), you can use the fingerprint filter plugin to generate a consistent id from each line. Logstash would then reprocess the lines from before the edit, but your data store would be able to ensure that you don't end up with duplicates.

elasticforme · December 20, 2018, 5:46pm

fingerprint or you can create your own document_id to avoid duplicate.

For example for one of my index. I created this uniq id.
Where projectname, systemtypeid and username combination will be unique forever.

filter {
mutate {
add_field => {
"doc_id" => "%{projectname}%{systemtypeid}%{username}"
}
}

system · January 17, 2019, 5:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File input not updating Logstash	6	1568	May 7, 2018
Modifying a CSV file - Logstash Logstash	4	349	September 24, 2020
Data loading in real time using logstash csv input Logstash	10	3385	February 16, 2017
I have a problem inputing file on logstash Logstash	3	399	November 7, 2019
Input file reads the entire file with every new row Logstash	3	474	April 18, 2017

Update CSV file changes to ELK

Related topics