Merging CSV files based on a common key column


(Rakesh Manjunath) #1

Hello All,

I have been reading the documentation and various posts on elastic community, but still difficult to find solution to this problem. I have 2 csv files, I need to merge into single csv file and inject the data into elasticsearch. I have read through filter plugin , translate plugin but still i need help. I am stuck with same problem for 3 weeks. Below is my config file.

I need help on the filter part where i can use compare a particular column thats user_id if user_id matches in both csv files, then i need to put all data from both csv files into single csv file. Please help me out.

input {
file {
type => "transaction"
path => "C:\Users\Desktop\DATA\TRANSACTION_TIME.csv "
start_position => "beginning"
sincedb_path => "/dev/null"
}

  file {
               type => "subscriber"
	   path => "C:\Users\Desktop\DATA\SUBSCRIBER.csv "
	   start_position => "beginning"
	   sincedb_path => "/dev/null"
        }
   }

####### FILTER ####################################################

filter {
csv {
separator => ","
}
}

####### OUTPUT ###################################################

output {
elasticsearch {
hosts => "http://localhost:9200"
index => "data_insights"
document_type => "data"
}
stdout {}
}


#2

Suppose we have a CSV that looks like this, where the second field is the key.

some,foo,other,words
more,bar,stuff,here

We can use logstash to convert this to something that looks like

foo,"some,foo,other,words"
bar,"more,bar,stuff,here"

Using something like this:

input { file { path => "/path/to/foo.csv" start_position => "beginning" sincedb_path => "/dev/null" } }
filter { csv {} }
output { stdout { codec => plain { format => '%{column2},"%{message}"
' } } }

Then you can use that as the input to a translate filter

    translate {
        dictionary_path => "/path/to/another.csv"
        field => "column1"
    }
    csv {
        source => "translation"
        columns => [ "c1", "c2", "c3", "c4" ]
    }

If you have quoted fields in the first CSV file then you would have to gsub to something else in that first logstash and gsub them back in the second.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.