Basic logstash example for CSV

hgpit · April 25, 2017, 11:20am

Hi I am trying to get my head around the ELK stack "thing" but being defeated!

I have installed all the latest v5.x components (ELK) onto CentOS 7.

Now, I have tonnes of 100mb files being FTP'd to the server, which I want logstash to index for ElasticSearch, so I can browse/search/analyze in Kibana.
Specifically, they are from a content filter, so it's web access logs, but the data's pretty much in CSV format from what I can tell.

I have created a logstash conf file to index one of the many files, for testing:

input {
    file {
        path => "/data/incomingdata/accesslogs.@20170320T130730.s"
        type => "Ironport"
    }
}

filter {
    csv {
        columns => ["col1","col2","col3"]
        separator => ","
    }
    if [message] =~ /^#/ {
        drop{}
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

So, it's reading in a file, getting the first three columns (comma delimited), but skipping any lines beginning with '#'.
Am I understanding this right?

My problem is that logstash successfully restarts, but no index gets created:

[root@svr-h003349 logstash]# curl 'localhost:9200/_cat/indices?v'
health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .kibana 9gCH5oUTS4W3RBSs_b9TlQ   1   1          1            0      3.1kb          3.1kb
[root@svr-h003349 logstash]# vim conf.d/wsa-h002606.conf

I'm a total newb with ELK so need some hand-holding! How can I proceed please?

Thank you!

Best Regards,

Elliot

hgpit · April 25, 2017, 11:52am

Okay, I have since realized that only a section of each log entry is actually comma delimited, so it looks like I have to deal with grok.

So here's a sample line from an access log I have:

10.11.23.100 "MY-DOMAIN\jbloggs@NTLM" - [20/Mar/2017:13:07:26 +0000] "GET http://ads35.vertamedia.com/vast/vpaid-config/?width=300&height=250&aid=49253&sid=0&site_full_url=http%3A%2F%2Fwww.dailymail.co.uk%2Fhome%2Findex.html&top_domain=www.dailymail.co.uk&v=2.3.215&t=flash&cb=14900152435947154&video_duration=30 HTTP/1.1" 200 41 TCP_MISS:DIRECT 76 DEFAULT_CASE_12-Level_3_Users-Internal_Users-DefaultGroup-NONE-NONE-DefaultGroup <IW_busi,-3.0,1,"-",-,-,-,1,"-",-,-,-,"-",1,-,"-","-",-,-,IW_busi,-,"-","-","Unknown","Unknown","-","-",41.68,0,-,"-","-",-,"-",-,-,"-","-",-,-,"-"> - 1490015246.877 NTLMSSP "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36" - 0 "Business and Industry"

I've started a new config:
input {
file {
path => "/data/incomingdata/wsa-h002606/accesslogs.@20170320T130730.s"
start_position => "beginning"
type => "Ironport"
}
}

filter {
    if [message] =~ /^#/ {
        drop{}
    }
    grok {
        match => { "test" => "%{IPV4:client} \"%{HTTPDUSER:auth}\" - \[%{MONTHDAY:monthday}/%{MONTH:month}/%{YEAR:year}:%{TIME} %{BASE10NUM:offset}\] \"%{WORD:method} %{URIPATH:url} " }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "idx_accesslogs"
    }
    stdout { codec => rubydebug }
}

I restarted logstash at this point, hoping it would start pulling those specified fields, but I'm still having zero indexes built (see first post).

How do I twist logstash's arm to actually do something?!

Thank you!

magnusbaeck · April 25, 2017, 12:28pm

You're probably running into the same beginner's mistake as everybody else. Look into the file input's sincedb_path option (and ignore_older if you're running Logstash 2.4).

hgpit · April 25, 2017, 12:41pm

Hi Magnus,

Thank you for your reply.

I was actually using sincedb_path originally, as I was following this CSV guide: https://qbox.io/blog/import-csv-elasticsearch-logstash-sincedb.

Unfortunately it doesn't appear to make any difference to my current predicament. I have re-added
sincedb_path => "/tmp/wsa-h002606.sincedb"
but no file gets created there upon restart of logstash

hgpit · April 25, 2017, 12:51pm

Ah, tell a lie. The sincedb file does get created, but only on the second service restart.

After leaving it for a few minutes though, then restarting the service again, the contents of the file is just:

 0 0 0

I guess this is indicating that logstash hasn't processed anything?

magnusbaeck · April 25, 2017, 1:28pm

Yes. It has at least not recorded anything of what it's done in the sincedb file. To ignore the sincedb functionality and always read the files from the beginning you can use /dev/null as the sincedb file.

system · May 23, 2017, 1:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to load CSV file in logstash Logstash	4	1088	July 6, 2017
Is this conf file working for csv import? Logstash	10	3444	July 6, 2017
Logstash index without timestemp Logstash	8	672	July 6, 2017
Feed Data to Elastic Serach Via Logstash using CSV Logstash	4	1453	July 6, 2017
Import CSV file to Logstash on windows Elasticsearch	2	2620	March 14, 2017

Basic logstash example for CSV

Related topics