Hi everyone,
I come to you because I think that I I don't understand something with aggregate filter or more generally with logstash.
I'm trying to read a CSV file and use aggregate filter, trying to reproduce something like this documentation example : Aggregate filter plugin | Logstash Reference [8.11] | Elastic
Actually the indexation run well and the data indexed corresponds to the mapping that I want for my index.
Problem is that every time I run the indexation on the same CSV file (to rebuild my index) I get different data in my index. By different data I mean that it's like the indexation skip lines randomly.
Here is an example of my aggregate filter :
filter { csv { separator => ";" remove_field => [ "loading_date","message","path","host","@version" ] columns => ["platforme_source","instrument_csv","parametre_csv","site_label","plateforme_label","instrument_label","instrument_facette_label","parametre_label","type_parametre","type_donnees","affichage","deploiement","deploiment_startdate","deploiement_enddate" ] skip_empty_rows => true skip_header => true } aggregate { task_id => "%{platforme_source}" code => " map['platforme_source'] ||= event.get('platforme_source') map['plateforme_label'] ||= event.get('plateforme_label') map['instruments_code'] ||= [] map['instruments'] ||= [] instrument_code = event.get('instrument_csv') if ! map['instruments_code'].include?(instrument_code) map['instruments_code'] << instrument_code map['instruments'] << {'instrument_code' => instrument_code, 'instrument_label' => event.get('instrument_label'), 'instrument_flabel' => event.get('instrument_facette_label')} end map['parameters'] ||= [] map['parameters'] << {'parametre_code' => event.get('parametre_csv'), 'parametre_label' => event.get('parametre_label'), 'type_parametre' => event.get('type_parametre'), 'type_donnees' => event.get('type_donnees'), 'instruments_code' => instrument_code} event.cancel() " push_previous_map_as_event => true }
}
To simply describe my problem, if you look at my data indexed example :
First indexation : instruments field is an array that contains 3 instruments
Second indexation : instruments field is an array that contains 5 instruments
Third indexation : instruments field is an array that contains 1 instrument
But I'm always pointing on the same CSV file so I expect to always have the same number of instruments indexed (and more generally exactly all the same data) when I rebuild my index.
I'm not sure to be clear but I hope someone will be able to help me
Best,
Leo.