Logstash import CSV data classification

First off, I am relatively inexperienced with ELK and am trying to perform a demo of ELK because I'm convinced it is a powerful and useful application for much of our work. I have been presented with a CSV export of data and am trying to get it brought into Elasticsearch using Logstash and think I am having trouble primarily with the data classification (particularly with the timestamp).

raw-data:

128797847,33299,1,2019,7,1,0,0.9204,"142322T11 ",30,"N ","A ",101,1.0000,10.0,1
128797847,33299,1,2019,7,1,1,1.2078,"142322T11 ",30,"N ","A ",101,1.0000,10.0,1

Reformated time to ISO8601 and moved to the front:

2019-7-1T0,128797847,33299,1,0.9204,"142322T11 ",30,"N ","A ",101,1.0000,10.0
2019-7-1T1,128797847,33299,1,1.2078,"142322T11 ",30,"N ","A ",101,1.0000,10.0

My config so far just brings in all the data generically which is a good first step but is obviously inefficient and somewhat unusable. My initial config is as follows:

input {
file {
path => "/home/user/userdata/data.csv"
start_position =>"beginning"
sincedb_path => "/dev/null"
}
}

filter {
csv {
columns => [
"date",
"User",
"Group",
"Org",
"Data",
"Info1",
"Info2",
"Info3",
"Info4",
"Info5",
"Info6",
"Info7"
]
separator => ","
}
}
output {
elasticsearch {
action => "index"
hosts => ["http://localhost:9200"]
index => "user-data"
}
}

How do I get the time column categorized as a date field and mappable to the @timestamp when generating the index pattern? I'd also like to categorize the other fields a bit more efficiently so any suggestions on that would be appreciated too.

What does the number after the T mean?

I thought that was necessary for the ISO standard but I have restructured just to remove the T and put a space between the day and the Hour. So in the example you listed "2019-7-1T1" Now looks like "2019-7-1 1" and that references July 1 2019 @ 1am.

I am experimenting with the following code in the conf file and I think i got it to work once but subsequent logstash calls have failed to add the index as expected so I may have to backtrack a bit:

    "mappings":{
    "date":{"type":"date","format":"yyyy-M-d H"}
    }

Edit:Corrected the date format with the correct day form...though it is still not working.

That is one approach, another is to use a date filter

date { match => [ "date", "yyyy-M-d H" ] }

That will overwrite the @timestamp field. Even if you set the target to be another field name, that field will automatically be created as a date in elasticsearch provided it has not already been created with some other type (elasticsearch tries to do date detection when it indexes a string field for the first time).

@Badger
That worked well in my test data. However, I noticed the data size was growing much faster than expected and I think my next challenge will be specifying appropriate attributes to the other datasets. But I'll start a different thread for that. Thank you for your help!!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.