Logstash import CSV data classification

stevezemlicka · November 13, 2020, 3:53pm

First off, I am relatively inexperienced with ELK and am trying to perform a demo of ELK because I'm convinced it is a powerful and useful application for much of our work. I have been presented with a CSV export of data and am trying to get it brought into Elasticsearch using Logstash and think I am having trouble primarily with the data classification (particularly with the timestamp).

raw-data:

128797847,33299,1,2019,7,1,0,0.9204,"142322T11 ",30,"N ","A ",101,1.0000,10.0,1
128797847,33299,1,2019,7,1,1,1.2078,"142322T11 ",30,"N ","A ",101,1.0000,10.0,1

Reformated time to ISO8601 and moved to the front:

2019-7-1T0,128797847,33299,1,0.9204,"142322T11 ",30,"N ","A ",101,1.0000,10.0
2019-7-1T1,128797847,33299,1,1.2078,"142322T11 ",30,"N ","A ",101,1.0000,10.0

My config so far just brings in all the data generically which is a good first step but is obviously inefficient and somewhat unusable. My initial config is as follows:

input {
file {
path => "/home/user/userdata/data.csv"
start_position =>"beginning"
sincedb_path => "/dev/null"
}
}

filter {
csv {
columns => [
"date",
"User",
"Group",
"Org",
"Data",
"Info1",
"Info2",
"Info3",
"Info4",
"Info5",
"Info6",
"Info7"
]
separator => ","
}
}
output {
elasticsearch {
action => "index"
hosts => ["http://localhost:9200"]
index => "user-data"
}
}

How do I get the time column categorized as a date field and mappable to the @timestamp when generating the index pattern? I'd also like to categorize the other fields a bit more efficiently so any suggestions on that would be appreciated too.

Badger · November 13, 2020, 4:46pm

What does the number after the T mean?

stevezemlicka · November 13, 2020, 5:03pm

I thought that was necessary for the ISO standard but I have restructured just to remove the T and put a space between the day and the Hour. So in the example you listed "2019-7-1T1" Now looks like "2019-7-1 1" and that references July 1 2019 @ 1am.

I am experimenting with the following code in the conf file and I think i got it to work once but subsequent logstash calls have failed to add the index as expected so I may have to backtrack a bit:

    "mappings":{
    "date":{"type":"date","format":"yyyy-M-d H"}
    }

Edit:Corrected the date format with the correct day form...though it is still not working.

Badger · November 13, 2020, 6:13pm

That is one approach, another is to use a date filter

date { match => [ "date", "yyyy-M-d H" ] }

That will overwrite the @timestamp field. Even if you set the target to be another field name, that field will automatically be created as a date in elasticsearch provided it has not already been created with some other type (elasticsearch tries to do date detection when it indexes a string field for the first time).

stevezemlicka · November 13, 2020, 7:59pm

@Badger
That worked well in my test data. However, I noticed the data size was growing much faster than expected and I think my next challenge will be specifying appropriate attributes to the other datasets. But I'll start a different thread for that. Thank you for your help!!!

system · December 11, 2020, 7:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Import csv file to elasticsearch using logstash Logstash	13	2619	May 24, 2017
Import csv file to ES but date isn't correct Logstash	2	306	March 16, 2019
Loading a .csv file into Elasticsearch using Logstash Logstash	3	2051	July 6, 2017
Import aggregated data Logstash	5	577	March 16, 2018
Can't input in Elasticsearch Logstash	9	1831	July 6, 2017

Logstash import CSV data classification

Related topics