Updating existing fields with new data pulled from .csv

(KK) #1


I'm trying to update existing fields with new data, and I'm not clear upsert or update API is the way to go (and a little confused on how exactly to implement those, for my use case)

Every week, I run the following: logstash -f C:\ELK\LOGCONF\Statistics.conf

The Statistics.conf file points to a .csv file, populated onto a local path
path => ["C:\cygwin64\home\Admin\Statistics.csv"]

I am using the Date filter in Logstash, since my data changes week to week, and I simply want to have the following weeks values to "update" and fill in the prior entries.

In other words:

On September 17th (call it Week 1), I run the logstash -f C:\ELK\LOGCONF\Statistics.conf
Some values are populated from the .csv, exampled here in the following 4 columns.

Date strong textScore Conditions Week

Sep-17 10 Good Week-1

On September 24th (call it Week 2),
I run the logstash -f C:\ELK\LOGCONF\Statistics.conf again.

This is what I would like to see below. Basically wanting to just add data/values for prior established column fields extracted from the .csv.

Date Score Conditions Week

Sep-17 10 Good Week-1
Sep-24 20 Better Week-2 (add new data pulled from .csv)

and on this date, again poll the .csv to add the new values

Oct-1 30 Excellent Week-3

Another point to note is that my Elasticsearch is temporarily mobile, so I running Ctrl-C on each weekly logstash -f operation, only to resume the following week with the new data, as the new "weekly-set" of statistics are generated and the .csv is made available.

Again, in this example, on October 1st, I would run the logstash -f command again, and hope to continue incrementally adding data as I go. The problem I am trying to solve, is that currently, when I run the import of a new weeks worth of data, I am getting duplicate documents seen in the Discover tab, irrespective of using the Date or timestamp.

In other words, I get the new data, but I also still have the old entries, so I am left with duplicates everywhere.

Is there a way this can be done, for this use case? Basically, I am wanting to poll a .csv to grab new data into prior established column fields.

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.