I left it running read mode for almost an hour, with nothing showing up on console past logstash's startup lines.
When I tried pipe, after some five minutes the process already had eaten up twice my RAM available.
I've just got home, tomorrow when I get back I'll post the config, but it's a simple file input, CSV filter, elasticsearch output conf.
When building, I ran some 10 lines through with Ruby debug output, and it was fine.
((edit))
here's the simplified w/o comments conf:
input {
file {
path => "(correct file path with escaped \ where needed)"
mode => ["read"]
}
}
filter {
if [message] =~ "^#" {
drop {}
}
mutate {
gsub => ["message","-",""]
}
csv {
columns => ["c-ip","cs-username","c-agent","sc-authenticated","date","time","s-computername","cs-referred","r-host","r-ip","r-port","time-taken","sc-bytes","cs-bytes","cs-protocol","s-operation","cs-uri","cs-mime-type","s-object-source","sc-status","rule","FilterInfo","sc-network","error-info","action","GMT Time","AuthenticationServer","ThreatName","UrlCategory","MalwareInspectionContentDeliveryMethod","MalwareInspectionDuration","internal-service-info","NIS application protocol","UrlCategorizationReason","SessionType","UrlDestHost","s-port","SoftBlockAction"]
separator => " "
}
date {
match => [ "GMT Time", "YYYYMMdd HH:mm:ss" ]
timezone => "America/Sao_Paulo"
}
if [bytesSent] {
ruby {
code => "event['kilobytesSent'] = event['bytesSent'].to_i / 1024.0"
}
}
if [bytesReceived] {
ruby {
code => "event['kilobytesReceived'] = event['bytesReceived'].to_i / 1024.0"
}
}
mutate {
convert => ["bytesSent", "integer"]
convert => ["bytesReceived", "integer"]
convert => ["timetaken", "integer"]
add_field => { "clientHostname" => "%{r-ip}" }
remove_field => [ "GMT Time"]
}
dns {
action => "replace"
reverse => ["clientHostname"]
}
useragent {
source=> "useragent"
prefix=> "browser"
}
}
output {
elasticsearch{
hosts => ["http://ip:9200"]
index => "index-%{+YYYY.MM.dd}"
}
# stdout {codec => rubydebug}
}
As I've said, testing with rubydebug stdout with the first 5-20 lines of each file shows the correct result is being processed.
I just started it again and here's the log:
Sending Logstash logs to /logstash-6.8.0/logs which is now configured via log4j2.properties
[2019-10-16T17:19:22,398][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-10-16T17:19:22,460][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.8.0"}
[2019-10-16T17:19:41,384][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-10-16T17:19:42,558][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://ip:9200/]}}
[2019-10-16T17:19:43,149][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://ip:9200/"}
[2019-10-16T17:19:43,320][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2019-10-16T17:19:43,336][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2019-10-16T17:19:43,399][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://ip:9200"]}
[2019-10-16T17:19:43,414][INFO ][logstash.outputs.elasticsearch] Using default mapping template
[2019-10-16T17:19:43,617][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2019-10-16T17:19:46,625][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/logstash-6.8.0/data/plugins/inputs/file/.sincedb_490c48491a0fc19c7297104da7cfc991", :path=>["double-escaped path"]}
[2019-10-16T17:19:46,719][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x42c201c8 run>"}
[2019-10-16T17:19:46,872][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-10-16T17:19:46,872][INFO ][filewatch.observingread ] START, creating Discoverer, Watch with file and sincedb collections
[2019-10-16T17:19:47,814][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2019-10-16T17:20:57,124][WARN ][logstash.runner ] SIGINT received. Shutting down.
[2019-10-16T17:20:57,327][INFO ][filewatch.observingread ] QUIT - closing all files and shutting down.
[2019-10-16T17:20:57,749][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x42c201c8 run>"}
[2019-10-16T17:20:57,749][INFO ][logstash.runner ] Logstash shut down.
and I just noticed the "double escaped path". Thought now that this could be the problem, changed the path to a non-escaped string. Started again, and the log looks just the same, except for the path=> being "normally" escaped. It started [2019-10-16T17:22:02,449][WARN
, ran up to [2019-10-16T17:22:20,949][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
and is just sitting there doing nothing.
Procexp's performance graph confirms that logstash's just sitting there, in stand-by:
((edit))
30 minutes later, still nothing.