[Logstash - Ruby Filter] - Execution of the filter occurs only once for cloned or splitted events

Hi All,

I have a real strange behavior with ruby filters.
I am using Elastic Stack 7.6.0 on Docker containers.

This I have see occurs when an event get splitted ( with the split filter ) or when the event gets cloned ( with the clone filter ).

The use case if the following:

I need to calculate next calendar month starting on the first day. For this I do the following:

  1. Use a ruby filter to get the date filed and split for the month and yeat like so:

     ruby {  id => "RubyDateYM"
     code => "
  2. I transform the fields to integer

    mutate {  id => "MutateYM"
    convert => {"tfzz_orgY" => "integer"}
    convert => {"tfzz_orgM" => "integer"}
  3. Then comes the ruby filter I think is has problems with:

I added the variable reset after the filter to validate if the filter gets executed only once. You can see that this is the case.

    ruby {  id => "RubySLACalculation"`
        		init => "@M=0;@X=0;@DT='';@MS='';@MM=0;@XX=0"
        		code =>  "@M=event.get('tfzz_orgM');

        				if (@M == 12 || @M == '12') then
        				   @MM = 1;
        				   @XX= @X.to_i + 1;
        				   @MM = @M.to_i + 1;
        				   @XX= @X.to_i;

        				if @MM == 10
        				   @MS = @MM.to_s;
        				elsif @MM == 11
        				   @MS = @MM.to_s;
        				elsif @MM == 12
        				   @MS = @MM.to_s;
        				   @MS = '0'+@MM.to_s;


  1. Then I do an elasticsearch filter that fails, sometimes.

     elasticsearch {
                              id => "ESCalendar"
                              hosts => ["elasticsearch:9200"]
                              user => "elastic"
                              password => "changeme"
                              index => "belegcalendar*"
                              sort => "inDate:desc"
                              query => "(MM:%{tfzz_slaM}) AND (YY:%{tfzz_slaY})"
                              fields => { "date27" => "tfzz_slaTargetDate" }
                              #remove_field => ["tfzz_slaY","tfzz_slaM" ]

You can see this in the logs:

`[2020-04-09T10:41:08,602][WARN ][logstash.outputs.elasticsearch][nb-ems-beleginfo_ndm] Could not index event to Elasticsearch. {:status=>400, :action=>["update", {:_id=>"DBEN0019222960", :_index=>"nb-jms-beleginfo-ndm-2020.04.09", :routing=>nil, :_type=>"_doc", :retry_on_conflict=>1}, #<LogStash::Event:0x5fb0ad2a>], :response=>{"update"=>{"_index"=>"nb-jms-beleginfo-ndm-2020.04.09", "_type"=>"_doc", "_id"=>"DBEN0019222960", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [tfzz_slaStartDate] of type [date] in document with id 'DBEN0019222960'. Preview of field's value: '2020--01'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2020--01] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}`

[2020-04-09T10:50:37,376][WARN ][logstash.filters.elasticsearch][nb-ems-beleginfo_ndm] Failed to query elasticsearch for previous event {:index=>"belegcalendar*", :error=>"[400] {\"error\":{\"root_cause\":[{\"type\":\"query_shard_exception\",\"reason\":\"Failed to parse query [(MM:) AND (YY:0)]\",\"index_uuid\":\"CDe-qZAfTuuo3RFDJZgQ9A\",\"index\":\"belegcalendar\"}],\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[{\"shard\":0,\"index\":\"belegcalendar\",\"node\":\"UvjcT6bISwGlQy_Mk05JQw\",\"reason\":{\"type\":\"query_shard_exception\",\"reason\":\"Failed to parse query [(MM:) AND (YY:0)]\",\"index_uuid\":\"CDe-qZAfTuuo3RFDJZgQ9A\",\"index\":\"belegcalendar\",\"caused_by\":{\"type\":\"parse_exception\",\"reason\":\"Cannot parse '(MM:) AND (YY:0)': Encountered \\\" \\\")\\\" \\\") \\\"\\\" at line 1, column 4.\\nWas expecting one of:\\n <BAREOPER> ...\\n \\\"(\\\" ...\\n \\\"*\\\" ...\\n <QUOTED> ...\\n <TERM> ...\\n <PREFIXTERM> ...\\n <WILDTERM> ...\\n <REGEXPTERM> ...\\n \\\"[\\\" ...\\n \\\"{\\\" ...\\n <NUMBER> ...\\n \",\"caused_by\":{\"type\":\"parse_exception\",\"reason\":\"Encountered \\\" \\\")\\\" \\\") \\\"\\\" at line 1, column 4.\\nWas expecting one of:\\n <BAREOPER> ...\\n \\\"(\\\" ...\\n \\\"*\\\" ...\\n <QUOTED> ...\\n <TERM> ...\\n <PREFIXTERM> ...\\n <WILDTERM> ...\\n <REGEXPTERM> ...\\n \\\"[\\\" ...\\n \\\"{\\\" ...\\n <NUMBER> ...\\n \"}}}}]},\"status\":400}"}

If I check the context of the event I get the following:

As you can see from the timestamp, it appear no to execute the filter at least once. Either it retains the value of the varible of the previous event or doesnt calculate it at all.

Any guidance will be very appreciated.

Thanks a lot,

Why are you using instance variables when you do not want values persisted across events? Try removing all of the @.

Hi Badger,

Thank you for your response!

I did the change and now I get

(ruby filter code):3: warning: already initialized constant

All over the place...

For some reason every variable gets treated as a constant ??
Isn't there a way to define local variable in the ruby script ?

Thanks a lot,

Hi Badger,

I changed the variable names like this:

Local variables begin with a lowercase letter or _.

Now it seems to be working. I'll monitor it in the next couple of days, and will update the post with the findings.

Thanks a lot!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.