I'm wanting to input a poorly formatted csv file into elasticsearch. The csv file is an indexer-csv from nutch 1.15. The file has rows as follows;
http://blackseamap.com/bshome/feed/,Comments on: Black Sea Home,"Comments on: Black Sea Home
Comments on: Black Sea Home
Dive into the mysterious waters of the Black Sea and discover what stories can be revealed.
"
http://blackseamap.com/careers-in-action/,Black Sea M.A.P – Maritime Archaeology Project | Careers in Action,"Black Sea M.A.P – Maritime Archaeology Project | Careers in Action
En
English
Bulgarian
The Mission
The Team
Education
Education Home
I have built a logstash config as follows. What I hope is that it concatenates using multiline and then parses out the url, title, and content. It never identifies the different fields separated by the commas. i just end up with the message and no id, title, and content. What have I missed?
input {
	file {
		path => "/home/monkstown/Nutch/nutch1.15/csvindexwriter/nutch.csv"
		start_position => "beginning"
		sincedb_path => "/dev/null"
    		codec => multiline {
      			pattern => "^http"
      			negate => true
      			what => "previous"
    		}
	}
}
filter {
	csv {
		separator => ","
		columns => ["id","title","content"]
	}
}
output {
	elasticsearch {
		hosts => "localhost"
		index => "oatest1"
		document_type => "oa_basic"
	}
	stdout {}
}
Thank-you