Logstash aggregate filter never index same data from CSV file

Hi everyone,

I come to you because I think that I I don't understand something with aggregate filter or more generally with logstash.

I'm trying to read a CSV file and use aggregate filter, trying to reproduce something like this documentation example : https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html#plugins-filters-aggregate-example4

Actually the indexation run well and the data indexed corresponds to the mapping that I want for my index.

Problem is that every time I run the indexation on the same CSV file (to rebuild my index) I get different data in my index. By different data I mean that it's like the indexation skip lines randomly.

Here is an example of my aggregate filter :

filter {
csv { 
separator => ";"
remove_field => [ "loading_date","message","path","host","@version" ]
columns => ["platforme_source","instrument_csv","parametre_csv","site_label","plateforme_label","instrument_label","instrument_facette_label","parametre_label","type_parametre","type_donnees","affichage","deploiement","deploiment_startdate","deploiement_enddate" ]
skip_empty_rows => true
skip_header => true
}
aggregate {
task_id => "%{platforme_source}"
code => "
map['platforme_source'] ||= event.get('platforme_source')
     map['plateforme_label'] ||= event.get('plateforme_label')
     
     map['instruments_code'] ||= []  
     map['instruments'] ||= []
     instrument_code = event.get('instrument_csv')
                                         
     if ! map['instruments_code'].include?(instrument_code)  
       map['instruments_code'] << instrument_code   
       map['instruments'] << {'instrument_code' => instrument_code, 'instrument_label' => event.get('instrument_label'), 'instrument_flabel' => event.get('instrument_facette_label')}                                   
     end
     
     map['parameters'] ||= []               
     map['parameters'] << {'parametre_code' => event.get('parametre_csv'), 'parametre_label' => event.get('parametre_label'), 'type_parametre' => event.get('type_parametre'), 'type_donnees' => event.get('type_donnees'), 'instruments_code' => instrument_code}             
     
     event.cancel()
   "
   push_previous_map_as_event => true
}  

}

To simply describe my problem, if you look at my data indexed example :

First indexation :  instruments field is an array that contains 3 instruments
Second indexation :  instruments field is an array that contains 5 instruments
Third indexation :  instruments field is an array that contains 1 instrument

But I'm always pointing on the same CSV file so I expect to always have the same number of instruments indexed (and more generally exactly all the same data) when I rebuild my index.

I'm not sure to be clear but I hope someone will be able to help me :slight_smile:

Best,
Leo.

Please do not post pictures of text. They are very hard to read and not searchable.