Logstash aggregate filter never index same data from CSV file

leobr29 · March 24, 2020, 4:28pm

Hi everyone,

I come to you because I think that I I don't understand something with aggregate filter or more generally with logstash.

I'm trying to read a CSV file and use aggregate filter, trying to reproduce something like this documentation example : Aggregate filter plugin | Logstash Reference [8.11] | Elastic

Actually the indexation run well and the data indexed corresponds to the mapping that I want for my index.

Problem is that every time I run the indexation on the same CSV file (to rebuild my index) I get different data in my index. By different data I mean that it's like the indexation skip lines randomly.

Here is an example of my aggregate filter :

filter {
csv { 
separator => ";"
remove_field => [ "loading_date","message","path","host","@version" ]
columns => ["platforme_source","instrument_csv","parametre_csv","site_label","plateforme_label","instrument_label","instrument_facette_label","parametre_label","type_parametre","type_donnees","affichage","deploiement","deploiment_startdate","deploiement_enddate" ]
skip_empty_rows => true
skip_header => true
}
aggregate {
task_id => "%{platforme_source}"
code => "
map['platforme_source'] ||= event.get('platforme_source')
     map['plateforme_label'] ||= event.get('plateforme_label')
     
     map['instruments_code'] ||= []  
     map['instruments'] ||= []
     instrument_code = event.get('instrument_csv')
                                         
     if ! map['instruments_code'].include?(instrument_code)  
       map['instruments_code'] << instrument_code   
       map['instruments'] << {'instrument_code' => instrument_code, 'instrument_label' => event.get('instrument_label'), 'instrument_flabel' => event.get('instrument_facette_label')}                                   
     end
     
     map['parameters'] ||= []               
     map['parameters'] << {'parametre_code' => event.get('parametre_csv'), 'parametre_label' => event.get('parametre_label'), 'type_parametre' => event.get('type_parametre'), 'type_donnees' => event.get('type_donnees'), 'instruments_code' => instrument_code}             
     
     event.cancel()
   "
   push_previous_map_as_event => true
}

}

To simply describe my problem, if you look at my data indexed example :

First indexation :  instruments field is an array that contains 3 instruments
Second indexation :  instruments field is an array that contains 5 instruments
Third indexation :  instruments field is an array that contains 1 instrument

But I'm always pointing on the same CSV file so I expect to always have the same number of instruments indexed (and more generally exactly all the same data) when I rebuild my index.

I'm not sure to be clear but I hope someone will be able to help me

Best,
Leo.

Badger · March 24, 2020, 5:04pm

Please do not post pictures of text. They are very hard to read and not searchable.

system · April 21, 2020, 5:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate filter only works sometimes, randomly doesn't work Logstash	1	431	May 10, 2017
Aggregate filter not working in csv output plugin Logstash	2	737	May 30, 2019
Aggregate filter logstash not working when the input is elasticsearch Logstash	1	653	April 16, 2019
Logstash is indexing the last line of my csv file in elasticsearch Logstash	3	1250	July 6, 2017
Problem with csv filter Logstash	7	401	June 24, 2020

Logstash aggregate filter never index same data from CSV file

Related topics