Basic logstash example for CSV

Hi I am trying to get my head around the ELK stack "thing" but being defeated!

I have installed all the latest v5.x components (ELK) onto CentOS 7.

Now, I have tonnes of 100mb files being FTP'd to the server, which I want logstash to index for ElasticSearch, so I can browse/search/analyze in Kibana.
Specifically, they are from a content filter, so it's web access logs, but the data's pretty much in CSV format from what I can tell.

I have created a logstash conf file to index one of the many files, for testing:

input {
    file {
        path => "/data/incomingdata/accesslogs.@20170320T130730.s"
        type => "Ironport"
    }
}

filter {
    csv {
        columns => ["col1","col2","col3"]
        separator => ","
    }
    if [message] =~ /^#/ {
        drop{}
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

So, it's reading in a file, getting the first three columns (comma delimited), but skipping any lines beginning with '#'.
Am I understanding this right?

My problem is that logstash successfully restarts, but no index gets created:

[root@svr-h003349 logstash]# curl 'localhost:9200/_cat/indices?v'
health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana 9gCH5oUTS4W3RBSs_b9TlQ   1   1          1            0      3.1kb          3.1kb
[root@svr-h003349 logstash]# vim conf.d/wsa-h002606.conf

I'm a total newb with ELK so need some hand-holding! How can I proceed please?

Thank you!

Best Regards,

Elliot

Okay, I have since realized that only a section of each log entry is actually comma delimited, so it looks like I have to deal with grok.

So here's a sample line from an access log I have:

10.11.23.100 "MY-DOMAIN\jbloggs@NTLM" - [20/Mar/2017:13:07:26 +0000] "GET http://ads35.vertamedia.com/vast/vpaid-config/?width=300&height=250&aid=49253&sid=0&site_full_url=http%3A%2F%2Fwww.dailymail.co.uk%2Fhome%2Findex.html&top_domain=www.dailymail.co.uk&v=2.3.215&t=flash&cb=14900152435947154&video_duration=30 HTTP/1.1" 200 41 TCP_MISS:DIRECT 76 DEFAULT_CASE_12-Level_3_Users-Internal_Users-DefaultGroup-NONE-NONE-DefaultGroup <IW_busi,-3.0,1,"-",-,-,-,1,"-",-,-,-,"-",1,-,"-","-",-,-,IW_busi,-,"-","-","Unknown","Unknown","-","-",41.68,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> - 1490015246.877 NTLMSSP "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - 0 "Business and Industry"

I've started a new config:
input {
file {
path => "/data/incomingdata/wsa-h002606/accesslogs.@20170320T130730.s"
start_position => "beginning"
type => "Ironport"
}
}

filter {
    if [message] =~ /^#/ {
        drop{}
    }
    grok {
        match => { "test" => "%{IPV4:client} \"%{HTTPDUSER:auth}\" - \[%{MONTHDAY:monthday}/%{MONTH:month}/%{YEAR:year}:%{TIME} %{BASE10NUM:offset}\] \"%{WORD:method} %{URIPATH:url} " }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

I restarted logstash at this point, hoping it would start pulling those specified fields, but I'm still having zero indexes built (see first post).

How do I twist logstash's arm to actually do something?!

Thank you!

You're probably running into the same beginner's mistake as everybody else. Look into the file input's sincedb_path option (and ignore_older if you're running Logstash 2.4).

Hi Magnus,

Thank you for your reply.

I was actually using sincedb_path originally, as I was following this CSV guide: https://qbox.io/blog/import-csv-elasticsearch-logstash-sincedb.

Unfortunately it doesn't appear to make any difference to my current predicament. I have re-added
sincedb_path => "/tmp/wsa-h002606.sincedb"
but no file gets created there upon restart of logstash

Ah, tell a lie. The sincedb file does get created, but only on the second service restart.

After leaving it for a few minutes though, then restarting the service again, the contents of the file is just:

 0 0 0

I guess this is indicating that logstash hasn't processed anything?

Yes. It has at least not recorded anything of what it's done in the sincedb file. To ignore the sincedb functionality and always read the files from the beginning you can use /dev/null as the sincedb file.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.