Problème d'indexation Elastic sur des csv


(Charles-Henri Boust) #1

Bonjour,

j'ai installé le trio ELK sur un RHEL et je cherche à indexer des fichiers csv.

Mon CSV est fait ainsi :
s00va9924890;TSM-ARZ109;2016/01/23;02:02:40;00:01:12;48504;234;0;53.07;76%;0.0597949;;;TSM_backup_INCR_APPLI_20160123_18874576.log;;INCR_APPLI;dsm.opt.incr;v5.0.2

Ma conf Logstash :

input {
file {
path => "/apps/logstash/input/sibr/.crs..csv"
start_position => beginning
}
}

filter {
csv {
columns => [
"HOSTNAME",
"INSTANCE",
"DATE_INJECTION",
"HEURE_INJECTION",
"DUREE",
"INSPECTED",
"BACKUP",
"FAILED",
"TRANSFERT_TIME",
"TAUX_CP",
"VOLUME",
"SAUVE",
"CLASSARCH",
"LOG",
"APP_SAVE",
"SAVE_MODE",
"DSM_OPT",
"VER_SCRIPTS"
]
separator => ";"
remove_field => ["message","host","path","@version","@timestamp"]
}
mutate {
convert => [ "LOG", "string" ]
}
mutate {
lowercase => [ "HOSTNAME" ]
}
}

output {
elasticsearch {
hosts => "localhost"
action => "index"
index => "sibr"
}

stdout {
codec => rubydebug
}
}

Je créé mon index via Sense :

PUT sibr
{
"mappings": {
"logs": {
"properties": {
"DATE_INJECTION": {
"type": "date",
"format": "yyyy/MM/dd",
"ignore_malformed": true
},
"HEURE_INJECTION": {
"type": "date",
"format": "HH:mm:ss"
},
"DUREE": {
"type": "date",
"format": "HH:mm:ss",
"ignore_malformed": true
},
"TRANSFERT_TIME": {
"type": "date",
"format": "ss.SSS",
"ignore_malformed": true
},
"VOLUME": {
"type": "float",
"ignore_malformed": true
},
"FAILED": {
"type": "integer",
"ignore_malformed": true
},
"INSPECTED": {
"type": "float",
"ignore_malformed": true
},
"BACKUP": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}

Puis je lance l'importation dans Elastic via Logstash de manière classique :

bin/logstash -f conf/logstash.conf --verbose

Mais arrive un moment où l'indexation du contenu du csv se décale, et la ça devient n'importe quoi. Par exemple :

[2016-01-26 10:29:09,594][DEBUG][action.bulk              ] [s00vl9931488] [sibr][4] failed to execute bulk item (index) index {[sibr][logs][AVJ9Q-yliY-Gr-iuz-Pp], source[{"HOSTNAME":"cr_appli","INSTANCE":"dsm.opt.incr","DATE_INJECTION":"v5.0.3"}]}
MapperParsingException[failed to parse [DATE_INJECTION]]; nested: IllegalArgumentException[Invalid format: "v5.0.3"];

Aussi je ne retrouve pas mon mapping après l'indexation :

GET sibr/_mappings
{
"sibr": {
"mappings": {
"logs": {
"properties": {
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"@version": {
"type": "string"
},
"APP_SAVE": {
"type": "string"
},
"BACKUP": {
"type": "string"
},
"CLASSARCH": {
"type": "string"
},
"DATE_INJECTION": {
"type": "date",
"format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis"
},
"DSM_OPT": {
"type": "string"
},
"DUREE": {
"type": "string"
},
"FAILED": {
"type": "string"
},
"HEURE_INJECTION": {
"type": "string"
},
"HOSTNAME": {
"type": "string"
},
"INSPECTED": {
"type": "string"
},
"INSTANCE": {
"type": "string"
},
"LOG": {
"type": "string"
},
"SAUVE": {
"type": "string"
},
"SAVE_MODE": {
"type": "string"
},
"TAUX_CP": {
"type": "string"
},
"TRANSFERT_TIME": {
"type": "string"
},
"VER_SCRIPTS": {
"type": "string"
},
"VOLUME": {
"type": "string"
},
"host": {
"type": "string"
},
"message": {
"type": "string"
},
"path": {
"type": "string"
},
"tags": {
"type": "string"
}
}
}
}
}
}


(Charles-Henri Boust) #2

Pour une raison que j'ignore, j'ai réussi à faire une importation sans accroc, avec toutes mes données à l'arrivée tel que je le souhaitais.


(system) #3