Csv logststash


#1

I was trying to import csv file to ES through logstash. But when I looked through the data in kibana, I just found that the first line of csv file(attributes of data) is accidentally recognized as data instance instead of columns.

This is the screenshot of kibana.

These are my csv file and config file.

ID,TweetDate,Tweet Time,Brand  Name,Original Tweet URL,Tweet Text only,Tweet Image,Tweet URL,Tweet Image + URL,Tweet Video,Tweet Video + URL,1.Brand Awareness,1.Type,2.CSR,2.Type,3.Customer Service,3.Type,4.Engagement,4.Type,5.Product Awareness,5.Type,6.Promotional,6.Type,7.Seasonal,7.Type,Appeal-Transformational,Appeal-Informational,# Likes,# Mentions,# Retweets,TweetDateTime
1,2018/2/15,6:49am,WebMD,https://twitter.com/WebMD/status/964152218059464705,0,0,0,1,0,0,0,0,1,1,1,4,1,4,0,0,0,0,0,0,0,1,17,0,19,2018-2-15 6:49
2,2018/2/15,2:00am,WebMD,https://twitter.com/WebMD/status/964076827852582912,0,0,0,1,0,0,0,0,1,1,1,4,1,1,0,0,0,0,1,3,0,1,77,3,48,2018-2-15 2:00
3,2018/2/14,9:00am,WebMD,https://twitter.com/WebMD/status/963820299010629635,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,2,1,0,43,0,16,2018-2-14 9:00
4,2018/2/15,1:29pm,WebMD,https://twitter.com/WebMD/status/964250343294095360,0,0,0,1,0,0,0,0,1,1,1,4,0,0,0,0,0,0,1,3,0,1,4,4,25,2018-2-15 13:29


input {
	file {
		path => "/Users/apple/Desktop/ITM444/logstash_TW/webmd_twitter.csv"
		start_position => "beginning"
		sincedb_path => "/dev/null"
	}
}
filter {
	csv {
		separator => ","
		columns => ["ID", "TweetDate", "Tweet Time", "Brand  Name", "Original Tweet URL", "Tweet Text only", "Tweet Image", "Tweet URL", "Tweet Image + URL", "Tweet Video", "Tweet Video + URL", "1.Brand Awareness", "1.Type", "2.CSR", "2.Type", "3.Customer Service", "3.Type", "4.Engagement", "4.Type", "5.Product Awareness", "5.Type", "6.Promotional", "6.Type", "7.Seasonal", "7.Type", "Appeal-Transformational", "Appeal-Informational", "# Likes", "# Mentions", "# Retweets", "TweetDateTime"]
	}
	mutate {convert => ["# Likes", "integer"]}
	mutate {convert => ["# Mentions", "integer"]}
	mutate {convert => ["# Retweets", "integer"]}
	
	date{
		match => ["TweetDateTime", "yyyy-MM-dd HH:mm"]
	}

}
output {
	elasticsearch {
		hosts => "localhost:9200"
		index => "webmd"
		document_type => "twitter"
	}
	stdout { codec => rubydebug }
}

(Mark Walkom) #2

I think you are missing the question here, what would you like assistance with :slight_smile:


#3

As you can see, my csv file actually has 4 data items and one line for the column.

But when I import the file, it is recognized as 5 data items (including the header line).

Is there any way to avoid this?


(Mark Walkom) #4

Ok, then https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html#plugins-filters-csv-autodetect_column_names should do what you want.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.