Hi,
First of all, I'm a beginner with the ELK stack, so please don't hurt me if I did stupid things
I am importing CSV files that have the following structure:
"nomprenom";"sexe";"datenaiss";"lieunaiss";"commnaiss";"paysnaiss";"datedeces";"lieudeces";"actedeces"
"ABILY*FRANCOIS SIMON MARIE/";"1";"19600216";"29105";"LANDIVISIAU";"";"20200718";"01004";"143"
"HUC*JEANNINE GABRIELLE ANNA/";"2";"19270209";"81125";"LACAZE";"";"20200716";"01004";"142"
The columns datenaiss
and datedeces
contain birth dates and death dates respectively.
I wanted to import these pieces of information as dates in Elasticsearch so that I can calculate some statistics, such as the mean lifetime, or to be able to select records with conditions like "datedeces between 15 january 2014 and 25 september 2019".
However the dates are sometimes not fully known. If the year isn't known, it will be 0000. If the month or the day isn't known, it will be 00. So I can have dates like 19780500
or 20190025
or 00000930
. So I can't import them as dates, as logstash won't be able to convert those strings to dates.
I decided to split each date into three fields: annee
, mois
and jour
(year, month, and day), each one being a number. But is there a way to handle in Kibana the combination of these three fields as a date ?
Here is my Logstash config:
input {
s3 {
access_key_id => "myaccesskeyid"
secret_access_key => "mysecretaccesskey"
bucket => "mybucket"
endpoint => "https://s3.nl-ams.scw.cloud"
region => "nl-ams"
watch_for_new_files => "false"
}
}
filter {
csv {
separator => ";"
skip_header => "true"
columns => ["nomprenom", "sexe", "datenaiss", "lieunaiss", "commnaiss", "paysnaiss", "datedeces", "lieudeces", "actedeces"]
}
mutate {
convert => { "sexe" => "integer" }
split => { "nomprenom" => "*" }
add_field => {
"nom" => "%{[nomprenom][0]}"
"prenoms" => "%{[nomprenom][1]}"
}
}
mutate {
gsub => ["prenoms", "/", ""]
}
ruby {
code => 'event.set("[naiss][date][annee]", event.get("datenaiss")[0..3]);
event.set("[naiss][date][mois]", event.get("datenaiss")[4..5]);
event.set("[naiss][date][jour]", event.get("datenaiss")[6..7]);
event.set("[deces][date][annee]", event.get("datedeces")[0..3]);
event.set("[deces][date][mois]", event.get("datedeces")[4..5]);
event.set("[deces][date][jour]", event.get("datedeces")[6..7]);'
}
mutate {
rename => {
"lieunaiss" => "[naiss][codelieu]"
"commnaiss" => "[naiss][commune]"
"paysnaiss" => "[naiss][pays]"
"lieudeces" => "[deces][codelieu]"
"actedeces" => "[deces][numacte]"
}
remove_field => ["nomprenom", "message", "host", "geoip", "datenaiss", "datedeces"]
}
if [naiss][pays] == "" {
mutate {
replace => ["[naiss][pays]", "FRANCE"]
}
}
mutate {
convert => {
"[naiss][date][annee]" => "integer"
"[naiss][date][mois]" => "integer"
"[naiss][date][jour]" => "integer"
"[deces][date][annee]" => "integer"
"[deces][date][mois]" => "integer"
"[deces][date][jour]" => "integer"
}
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "logstash-insee-deces"
workers => 1
user => "myuser"
password => "mypassword"
}
}
The resulting data structure is:
{
"_index": "logstash-insee-deces",
"_type": "_doc",
"_id": "LOW53HQBVzcA8wC1_M15",
"_version": 1,
"_score": null,
"_source": {
"prenoms": "ALFRED VICTOR",
"naiss": {
"date": {
"mois": 1,
"jour": 5,
"annee": 1899
},
"pays": "FRANCE",
"codelieu": "02691",
"commune": "SAINT-QUENTIN"
},
"sexe": 1,
"@timestamp": "2020-09-30T01:56:50.110Z",
"nom": "RAYEZ",
"deces": {
"date": {
"mois": 4,
"jour": 5,
"annee": 1972
},
"codelieu": "93007",
"numacte": "103"
},
"@version": "1"
},
"fields": {
"@timestamp": [
"2020-09-30T01:56:50.110Z"
]
},
"sort": [
1601431010110
]
}
codelieu
fields and numacte
field are strings because they can also contain letters.
So... my three big questions are:
- Is there a way to create a date from three fields in Kibana, knowing that sometimes the day or the month or the year may be missing ?
- How would you deal with those inconsistent dates ?
- Is my logstash configuration optimizable? (aka: is there a simpler and/or faster way of doing the things I'm doing?)
Thank you very much
William Blondel