Logstash aggregate filter never index same data from CSV file

Hi everyone,

I come to you because I think that I I don't understand something with aggregate filter or more generally with logstash.

I'm trying to read a CSV file and use aggregate filter, trying to reproduce something like this documentation example : Aggregate filter plugin | Logstash Reference [8.11] | Elastic

Actually the indexation run well and the data indexed corresponds to the mapping that I want for my index.

Problem is that every time I run the indexation on the same CSV file (to rebuild my index) I get different data in my index. By different data I mean that it's like the indexation skip lines randomly.

Here is an example of my aggregate filter :

filter {
csv { 
separator => ";"
remove_field => [ "loading_date","message","path","host","@version" ]
columns => ["platforme_source","instrument_csv","parametre_csv","site_label","plateforme_label","instrument_label","instrument_facette_label","parametre_label","type_parametre","type_donnees","affichage","deploiement","deploiment_startdate","deploiement_enddate" ]
skip_empty_rows => true
skip_header => true
}
aggregate {
task_id => "%{platforme_source}"
code => "
map['platforme_source'] ||= event.get('platforme_source')
     map['plateforme_label'] ||= event.get('plateforme_label')
     
     map['instruments_code'] ||= []  
     map['instruments'] ||= []
     instrument_code = event.get('instrument_csv')
                                         
     if ! map['instruments_code'].include?(instrument_code)  
       map['instruments_code'] << instrument_code   
       map['instruments'] << {'instrument_code' => instrument_code, 'instrument_label' => event.get('instrument_label'), 'instrument_flabel' => event.get('instrument_facette_label')}                                   
     end
     
     map['parameters'] ||= []               
     map['parameters'] << {'parametre_code' => event.get('parametre_csv'), 'parametre_label' => event.get('parametre_label'), 'type_parametre' => event.get('type_parametre'), 'type_donnees' => event.get('type_donnees'), 'instruments_code' => instrument_code}             
     
     event.cancel()
   "
   push_previous_map_as_event => true
}  

}

To simply describe my problem, if you look at my data indexed example :

First indexation :  instruments field is an array that contains 3 instruments
Second indexation :  instruments field is an array that contains 5 instruments
Third indexation :  instruments field is an array that contains 1 instrument

But I'm always pointing on the same CSV file so I expect to always have the same number of instruments indexed (and more generally exactly all the same data) when I rebuild my index.

I'm not sure to be clear but I hope someone will be able to help me :slight_smile:

Best,
Leo.

Please do not post pictures of text. They are very hard to read and not searchable.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.