I'm wanting to input a poorly formatted csv file into elasticsearch. The csv file is an indexer-csv from nutch 1.15. The file has rows as follows;
http://blackseamap.com/bshome/feed/,Comments on: Black Sea Home,"Comments on: Black Sea Home
Comments on: Black Sea Home
Dive into the mysterious waters of the Black Sea and discover what stories can be revealed.
"
http://blackseamap.com/careers-in-action/,Black Sea M.A.P – Maritime Archaeology Project | Careers in Action,"Black Sea M.A.P – Maritime Archaeology Project | Careers in Action
En
English
Bulgarian
The Mission
The Team
Education
Education Home
I have built a logstash config as follows. What I hope is that it concatenates using multiline and then parses out the url, title, and content. It never identifies the different fields separated by the commas. i just end up with the message and no id, title, and content. What have I missed?
input {
file {
path => "/home/monkstown/Nutch/nutch1.15/csvindexwriter/nutch.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline {
pattern => "^http"
negate => true
what => "previous"
}
}
}
filter {
csv {
separator => ","
columns => ["id","title","content"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "oatest1"
document_type => "oa_basic"
}
stdout {}
}
Thank-you