Basic logstash example for CSV


#1

Hi I am trying to get my head around the ELK stack "thing" but being defeated!

I have installed all the latest v5.x components (ELK) onto CentOS 7.

Now, I have tonnes of 100mb files being FTP'd to the server, which I want logstash to index for ElasticSearch, so I can browse/search/analyze in Kibana.
Specifically, they are from a content filter, so it's web access logs, but the data's pretty much in CSV format from what I can tell.

I have created a logstash conf file to index one of the many files, for testing:

input {
    file {
        path => "/data/incomingdata/accesslogs.@20170320T130730.s"
        type => "Ironport"
    }
}

filter {
    csv {
        columns => ["col1","col2","col3"]
        separator => ","
    }
    if [message] =~ /^#/ {
        drop{}
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

So, it's reading in a file, getting the first three columns (comma delimited), but skipping any lines beginning with '#'.
Am I understanding this right?

My problem is that logstash successfully restarts, but no index gets created:

[root@svr-h003349 logstash]# curl 'localhost:9200/_cat/indices?v'
health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana 9gCH5oUTS4W3RBSs_b9TlQ   1   1          1            0      3.1kb          3.1kb
[root@svr-h003349 logstash]# vim conf.d/wsa-h002606.conf

I'm a total newb with ELK so need some hand-holding! How can I proceed please?

Thank you!

Best Regards,

Elliot


#2

Okay, I have since realized that only a section of each log entry is actually comma delimited, so it looks like I have to deal with grok.

So here's a sample line from an access log I have:

10.11.23.100 "MY-DOMAIN\jbloggs@NTLM" - [20/Mar/2017:13:07:26 +0000] "GET http://ads35.vertamedia.com/vast/vpaid-config/?width=300&height=250&aid=49253&sid=0&site_full_url=http%3A%2F%2Fwww.dailymail.co.uk%2Fhome%2Findex.html&top_domain=www.dailymail.co.uk&v=2.3.215&t=flash&cb=14900152435947154&video_duration=30 HTTP/1.1" 200 41 TCP_MISS:DIRECT 76 DEFAULT_CASE_12-Level_3_Users-Internal_Users-DefaultGroup-NONE-NONE-DefaultGroup <IW_busi,-3.0,1,"-",-,-,-,1,"-",-,-,-,"-",1,-,"-","-",-,-,IW_busi,-,"-","-","Unknown","Unknown","-","-",41.68,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> - 1490015246.877 NTLMSSP "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - 0 "Business and Industry"

I've started a new config:
input {
file {
path => "/data/incomingdata/wsa-h002606/accesslogs.@20170320T130730.s"
start_position => "beginning"
type => "Ironport"
}
}

filter {
    if [message] =~ /^#/ {
        drop{}
    }
    grok {
        match => { "test" => "%{IPV4:client} \"%{HTTPDUSER:auth}\" - \[%{MONTHDAY:monthday}/%{MONTH:month}/%{YEAR:year}:%{TIME} %{BASE10NUM:offset}\] \"%{WORD:method} %{URIPATH:url} " }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

I restarted logstash at this point, hoping it would start pulling those specified fields, but I'm still having zero indexes built (see first post).

How do I twist logstash's arm to actually do something?!

Thank you!


(Magnus Bäck) #3

You're probably running into the same beginner's mistake as everybody else. Look into the file input's sincedb_path option (and ignore_older if you're running Logstash 2.4).


#4

Hi Magnus,

Thank you for your reply.

I was actually using sincedb_path originally, as I was following this CSV guide: https://qbox.io/blog/import-csv-elasticsearch-logstash-sincedb.

Unfortunately it doesn't appear to make any difference to my current predicament. I have re-added
sincedb_path => "/tmp/wsa-h002606.sincedb"
but no file gets created there upon restart of logstash


#5

Ah, tell a lie. The sincedb file does get created, but only on the second service restart.

After leaving it for a few minutes though, then restarting the service again, the contents of the file is just:

 0 0 0

I guess this is indicating that logstash hasn't processed anything?


(Magnus Bäck) #6

Yes. It has at least not recorded anything of what it's done in the sincedb file. To ignore the sincedb functionality and always read the files from the beginning you can use /dev/null as the sincedb file.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.