How to create a script statically in Logstash 7.17

How to statically generate a script in Logstash 7.17

Hi, I would like to add elements to nested in elasticsearch via script via Logstash.

Currently, compilations, cache_evisions are increasing.

I also increased max_compilations_rate.

PUT _cluster/settings
{
  "transient": {
    "script.max_compilations_rate": "100000/1m"
  }
}

Currently, there are N nodes and each node has compilations and cache_evictions values.

We know that these values are initialized when a node reboots.

"nodes" : {
    "1" : {
      "script" : {
        "compilations" : 1042572,
        "cache_evictions" : 1042472,
        "compilation_limit_triggered" : 0
      }
    },
    "2" : {
      "script" : {
        "compilations" : 1059702,
        "cache_evictions" : 1059602,
        "compilation_limit_triggered" : 0
      }
    },
                   .
                   .
}

This seems to have a bad effect on the cluster.

Is there any way to use static scripts ?

Below is the current Logstash code.

elasticsearch {
      id => "begins"
      hosts => ["http://10.10.155.193:9210"]
      index => "%{[@metadata][_index_name]}"
      document_id => "%{[@metadata][_id]}"
      routing => "%{[@metadata][_routing_id]}"
      action => "update"
      doc_as_upsert => false
      script => "
        if (ctx._source.temp_wtime != null) {
          if (ctx._source.begin_wtime == null) {
            ctx._source.begin_wtime = new ArrayList();
          }

          if (ctx._source.begin_wtime instanceof Collection) {
            if (!ctx._source.begin_wtime.contains('%{wtime}')) {
              ctx._source.begin_wtime.add('%{wtime}');
            }
          } else {
            ctx._source.begin_wtime = ['%{wtime}'];
          }
          ctx._source.begin_count = ctx._source.begin_wtime.size();
        }
      "
      script_lang => "painless"
    }

Thank you

Can you provide an example of what you are trying to achieve with that script? What transformation it is done in your event?

I think it would be better performance wise to do that on Logstash side, not on Elasticsearch side.

@leandrojmp
Hi! Thank you for your reply.
We have a unique doc for each member.
When a member joins the site, we want to continuously add an access (begin) log to that member's doc.

Currently, we're doing the conversion in Logstash.

Below is our current mapping

      "prod_view": {
        "type": "nested",
        "properties": {
          "prod_code": {
            "type": "keyword"
          },
          "wtime": {
            "type": "date"
          }
        }
      },
      "begin_wtime": {
        "type": "keyword",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },

Please share a document sample and what it looks like before and after the script.

Just reading the script it is not clear enough what you are doing or if you can do everything in Logstash.

It looks like that you are just adding the field begin_wtime with the value of the field wtime and then adding a field named begin_count with the size of the begin_wtime field?

If it is just that, I'm not sure why use an Elasticsearch script if you are already using Logstash.

I'm not fluent in English, so it took me a while to understand.

Below is the first documentation I put in, not the script.

    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "SITE_1_m2021111",
        "_score" : 0.0,
        "_routing" : "SITE_1",
        "_source" : {
          "member_uid" : "test@gmail.com"
        }
      }
  ]

The current script fetches the access (begin) log from the RDBMS and adds it to begin_wtime in date (Y-m-d) format.

Then it fetches the size of begin_wtime and puts it in begin_count.
This is what the final document looks like after the script is run.

    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "SITE_1_m2021111",
        "_score" : 0.0,
        "_routing" : "SITE_1",
        "_source" : {
          "member_uid" : "test@gmail.com",
          "begin_wtime" : [ // script
            "2023-05-29",
            "2023-05-30",
            "2023-05-31"
          ],
          "begin_count" : 3 // script
        }
      }
  ]

Is there a better way to do this ?

Thank you.

Additionally, is this the part where you recommend the ruby filter feature of Logstash?

But where is this coming from? You didn't share the part from where this will be sourced, you just shared your Elasticsearch output. You would need to provide your complete Logstash configuration.

You probably can do that entirely on Logstash using mutate filters or maybe some ruby code if you can't solve it with a couple of mutates.

I don't think your painless script can be cached as it can change according to the value of %{wtime}, but I do not use Elasticsearch scripts that much.

The thing is, if you are using Logstash it would be better to do all your transformations on Logstash side.

Let's share the full Logstash configuration.

In MySQL, access the begins table and create a
to fetch all logs.

Instead of having all the dates in one row, each row has the dates accessed.

Below is an example of the begins table

member_uid | wtime
test@gmail.com | 2023-05-29
test@gmail.com | 2023-05-30
test@gmail.com | 2023-05-31

input {
  jdbc {
    id => "begins.v1-input"
    type => "begins.v1"
    jdbc_driver_library => "/usr/share/java/mysql-connector-java-5.1.49-bin.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    .
    .
    .
  }
}

filter {
		mutate {
			id => "begins.v1-mutate"
			copy => { "pk" => "[@metadata][_id]" }
			copy => { "site_code" => "[@metadata][_routing_id]" }
            add_field => { "[@metadata][_index_name]" => "test" }
            add_field => { "temp_wtime" => "%{wtime}" }
		}
}

I want to find the articles corresponding to the uid and continuously append them to the begin_wtime field.

output {
  elasticsearch {
      id => "begins"
      hosts => ["http://es:9210"]
      index => "%{[@metadata][_index_name]}"
      document_id => "%{[@metadata][_id]}"
      routing => "%{[@metadata][_routing_id]}"
      action => "update"
      doc_as_upsert => false
      script => "
        if (ctx._source.temp_wtime != null) {
          if (ctx._source.begin_wtime == null) {
            ctx._source.begin_wtime = new ArrayList();
          }

          if (ctx._source.begin_wtime instanceof Collection) {
            if (!ctx._source.begin_wtime.contains('%{wtime}')) {
              ctx._source.begin_wtime.add('%{wtime}');
            }
          } else {
            ctx._source.begin_wtime = ['%{wtime}'];
          }
          ctx._source.begin_count = ctx._source.begin_wtime.size();
        }
      "
      script_lang => "painless"
    }
}

Is there any way to fetch the DB value from the Logstash side and add it to the existing begin_wtime ?

You are already doing that with the jdbc input.

Probably yes, but you would needto share what your document looks like, for example, you have this in your database.

member_uid | wtime
test@gmail.com | 2023-05-29
test@gmail.com | 2023-05-30
test@gmail.com | 2023-05-31

But how is this, from the jdbc input, looks like in Logstash? Do you have one event for each item or are you grouping them in your select, it is not clear since you ommited your query.

Can you add an extra file output and share what is the output from Logstash?

output {
    file {
        path => "/tmp/sample-document.json"
    }
}

Then share the output, this way it will be possible to know what the document looks like on Logstash.

I'm late in replying.

Below is part of the contents of the sample-document.json as you mentioned.

{"@timestamp":"2023-11-18T14:22:45.745Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-12"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-12"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-15"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-15"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-16"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-16"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-16"}
{"@timestamp":"2023-11-18T14:22:45.746Z","site_code":"SITE_1","type":"begins.v1","wtime":"2023-05-16"}

But where are the other fields that you have in your logstash configuration, like pk, site_code, member_uid and temp_wtime?

My mistake.
This is the final JSON result

{"site_code":"SITE_1","member_uid":"test@gmail.com","wtime":"2023-05-12","@timestamp":"2023-11-18T14:33:10.565Z","log_idx":1,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-12","pk":"1"}
{"site_code":"SITE_1","member_uid":"test@gmail.com","wtime":"2023-05-12","@timestamp":"2023-11-18T14:33:10.566Z","log_idx":2,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-12","pk":"1"}
{"site_code":"SITE_1","member_uid":"test2@gmail.com","wtime":"2023-05-15","@timestamp":"2023-11-18T14:33:10.566Z","log_idx":3,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-15","pk":"2"}
{"site_code":"SITE_2","member_uid":"test2@gmail.com","wtime":"2023-05-15","@timestamp":"2023-11-18T14:33:10.566Z","log_idx":4,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-15","pk":"2"}
{"site_code":"SITE_1","member_uid":"test@gmail.com","wtime":"2023-05-16","@timestamp":"2023-11-18T14:33:10.566Z","log_idx":5,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-16","pk":"1"}
{"site_code":"SITE_1","member_uid":"test@gmail.com","wtime":"2023-05-16","@timestamp":"2023-11-18T14:33:10.567Z","log_idx":6,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-16","pk":"1"}
{"site_code":"SITE_1","member_uid":"test3@gmail.com","wtime":"2023-05-16","@timestamp":"2023-11-18T14:33:10.567Z","log_idx":7,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-16","pk":"3"}
{"site_code":"SITE_1","member_uid":"test@gmail.com","wtime":"2023-05-16","@timestamp":"2023-11-18T14:33:10.570Z","log_idx":8,"@version":"1","type":"begins.v1","temp_wtime":"2023-05-16","pk":"1"}

The ID of the actual document is made up of member_uid .
The goal is to ignore the same wtime in the member_uid if it comes in, and append if it comes in with a date that isn't included.

I think I've gotten the hang of it.

I tried using Logstash's aggregate filter and got it to work.

1 Like

The main issue is that both Logstash and Elasticsearch are event based where each event is indepent from the previous or next event.

I'm not sure how your Elasticsearch script is working because I do not have much experience with painless script.

To do something similar in Logstash, where your event depends on the previous and next event, you would need two things:

  • Make sure that your jdbc input is sorted by the pk
  • Use the aggregate filter, which needs to run with just one worker.

The following aggregate filter would work:

     aggregate {
       task_id => "%{pk}"
       code => "
        map['pk'] ||= event.get('pk')
        map['site_code'] ||= event.get('site_code')
        map['member_uid'] ||= event.get('member_uid')
        map['begin_wtime'] ||= []
        map['begin_wtime'] << event.get('wtime')
        event.cancel()
       "
       push_previous_map_as_event => true
       timeout => 30
       timeout_code => "
        event.set('begin_wtime', event.get('begin_wtime').uniq)
        event.set('begin_count', event.get('begin_wtime').length())
        "
     }

Yeah, that is the approach, I was able to achieve a similar output with the aggregate filter, but need to input to be ordered before, I think you can use a SORT by in your SELECT on the JDBC input to do that.

Oh, thank goodness it's the same way!

Thank you so much for your help.

Have a nice day :slight_smile:

@leandrojmp

I hadn't thought of that part.

My understanding is that if a new date is entered after Logstash reboots or the computer reboots, all dates entered before that are reset.

Do you have any advice to solve that part ?

I see Script params via Logstash, is it possible? - #6 by Luca_Belluccini

Unfortunately no, as mentioned before Logstash is event based and every event is considered independent from each other, in some cases you can use the aggregate filter to combine information from multiple events, but this may not always work as expected or may not scale well.

For example, in your case you would need to always query your entire db, you can't have a tracking column for example because you need to aggregate on all rows for an specific pk value.

What you want to achieve with Elasticsearch? Could you change your approach?

For example, why you need to have the wtime values on an array and a field with the count? Could you not have each entry that you have on your database as individual events and use query aggregations on Elasticsearch side?

From my experience the approach in using Elasticsearch should be different from when you use normal RDBMS as Elasticsearch is a document store and every document is independent from each other.

Some things that are pretty easy to do on RDBMS are pretty hard, or not possible, on Elasticsearch.

In your case, if you need to keep track of the new dates for an specific id and keep the document in Elasticsearch updated, you may need to write something to do that, like an external script with Python or other language.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.