Logstash is not loading data to elasticsearch


(Daniella Kuntzman) #1

I'm trying to upload data from redshift into elastic search.
I exported some data from redshift, so now it's just a CSV file, so I thought using logstash with the CSV filter will work.
It looks like it's working, the sincedb file is reaching the end of the file, and after that (when I run logstash locally with the debug flag) I see these lines being printed over and over and it looks as if it will never stop:

_globbed_files: /Users/bla/Downloads/0001*: glob is: ["/Users/bla/Downloads/0001_part_00"] {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"346", :method=>"_globbed_files"} Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"458", :method=>"flush"} Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"458", :method=>"flush"} Pushing flush onto pipeline {:level=>:debug, :file=>"logstash/pipeline.rb", :line=>"458", :method=>"flush"}

I checked the ES index and it wasn't updated.
I figure I'm doing something wrong with my config file so here it is:

`input {
file {
path => "/Users/bla/Downloads/0001*"
start_position => "beginning"
ignore_older => 0
}

filter {
csv {
columns => [sessionid, useridentify, firstproject, impressionproject, browsertype, browserversion, os, osversion, oslanguage, devicetype, devicevendor, devicemodel, deviceid, location, screenres, searchengine, bandwidthmeasure, interludeplayer, interludeplayerversion, interludeplayerengine, publisherid, context, referrer, clustername, treehouse_buttonclickcount, pathid, projectcount, desktopviewcount, iosviewcount, andviewcount, sharecount, replaycount, viewcount, viewendcount, interactioncount, noninteractioncount, subscribecount, subscribedisplaycount, subscribeentercount, channelswapcount, ekosubdisplaycount, ekosubintentcount, ekosubsuccesscount, achievementcount, linkoutcount, adskips, adcompletes, eventcount, mineventtime, maxeventtime, minusertime, maxusertime, sessiontotaltime, domloadingtime, domcompletetime, appreadytime, playerloadedtime, city, region, country, latitude, longitude, subscribesource]
convert => {
"mineventtime" => "date_time"
"maxeventtime" => "date_time"
"minusertime" => "date_time"
"maxusertime" => "date_time"
"browserversion" => "integer"
"treehouse_buttonclickcount" => "integer"
"projectcount" => "integer"
"desktopviewcount" => "integer"
"iosviewcount" => "integer"
"andviewcount" => "integer"
"sharecount" => "integer"
"replaycount" => "integer"
"viewcount" => "integer"
"viewendcount" => "integer"
"interactioncount" => "integer"
"noninteractioncount" => "integer"
"subscribecount" => "integer"
"subscribedisplaycount" => "integer"
"subscribeentercount" => "integer"
"channelswapcount" => "integer"
"ekosubdisplaycount" => "integer"
"ekosubintentcount" => "integer"
"ekosubsuccesscount" => "integer"
"achievementcount" => "integer"
"linkoutcount" => "integer"
"adskips" => "integer"
"adcompletes" => "integer"
"eventcount" => "integer"
"sessiontotaltime" => "integer"
"nodestarts" => "integer"
"nodeends" => "integer"
"latitude" => "float"
"longitude" => "float"
}
separator => "|"
remove_field => ["message"]
skip_empty_columns => true
}
}

output {
amazon_es {
action => "index"
index => "sessions"
document_type => "session"
document_id => "%{sessionid}"
hosts => ["our-host-in-amazon.com"]
region => "our-region"
flush_size => 50
idle_flush_time => 5
protocol => "https"
template => "../session_mapping.json"
max_retries => 5
codec => "json"
// All the keys are here too.
}
}
`

Thanks for the help :slight_smile:


(Magnus Bäck) #2

Use a stdout { codec => rubydebug } output to debug this and figure out if the problem is on the input side, i.e. if you can get Logstash to read the input file and emit events. If that works, re-enable the amazon_es output and redo the same operation.


(JL) #3

Hi,

So , to update you. I got it working It's logstash logstash-5.3.0-1.noarch

I don't know what it was, It just took 10 minutes of kind of sitting there and then my indicies started populating.

I was able to make a pie chart showing port usage by popularity however, I do'nt understand why it's working the way it does. The values don't make much sense to me.

My first visualization is here

I'm using the logstash netflow definitions.

 netflow_definitions => "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-netflow-3.3.0/lib/logstash/codecs/netflow/netflow.yaml"

If anyone has any pointers on getting the ball rolling with netflow dashboards / visualizations I'd appreciate that but this is more kibana than logstash.


(system) #4