Ruby filter - How to access some elastic index data from a ruby filter

akapit · June 20, 2018, 1:25pm

Hi,
I'm trying to let logstash add a field based in the calculation of some incoming event and data that I have in another index or external parameter.

How can I access to data from any elasticsearch index in the ruby code?

This is what I'm trying to do:

ruby {
    code => "event.set('new_field', event.get('score').to_i * another_index.config.rate"
}

Where 'another_index.config.rate' should be** data from a elastic index from outside the scope of the logstash import.

I actually tried by performing a GET http request to elasticsearch from the ruby code and it seems to work, but I feel this way is not right if it's actually making a GET request for every one of the millions records that logstash is importing... this is the code i'm using:

uri = URI.parse('http://localhost:9200/test/config/1')
           response = Net::HTTP.get_response(uri)
           if response.code == '200'
             result = JSON.parse(response.body)
             rate = result['_source']['rate']
             event.set('new_field', event.get('Installs').to_i * rate.to_i)

           else
             event.set('new_field', '0')
           end

What's the right way to achieve this?

Thanks in advance

Christian_Dahlqvist · June 20, 2018, 3:07pm

This is what the elasticsearch filter plugin does, but this adds a lot of overhead and reduces throughput as you correctly point out. That will be the case even if you do it through Ruby code.

If you have a limited set of data you are looking up that does not change frequently, you could put it in a file and use the translate filter plugin which stores the data in memory and therefore have a significantly smaller effect on throughput.

akapit · June 21, 2018, 7:28am

Might the jdbc_streaming filter help someway to reuse the connection and make it better?
It must be an efficient way of doing this as It looks to me as a very common scenario.

Thanks a lot

Christian_Dahlqvist · June 21, 2018, 7:32am

If you have the data in a database, the jdbc streaming plugin is certainly an option.

akapit · June 21, 2018, 7:33am

About the translate filter recommendation, could work but I need actually data that's coming from database.. I could eventually consume that data from mongodb instead of elastic if that helps somehow.

And last... in the case that there is no way rather than my code above... the 2nd question is, what happens when logstash bombs elastic so much so many times? how does elastic react to that?

Christian_Dahlqvist · June 21, 2018, 7:38am

Issuing a lot of queries will put extra load on the cluster, so I would avoid this for streams with high throughput rates.

For this to be efficient you need a filter that can cache client-side, and unfortunately I do not think the Elasticsearch filter does that yet. There is an open issue, which had some comments not too far back. Maybe @guyboertje can provide some further details?

guyboertje · June 21, 2018, 7:57am

You really don't want to make a http call to elasticsearch in the ruby filter. You will need to handle failures etc.

guyboertje · June 21, 2018, 7:59am

If your data is (or can be) in a JDBC accessible database then JDBC Streaming is your best bet.

guyboertje · June 21, 2018, 8:04am

Another option is to run a second LS pipeline that takes the data out of ES with an ES input plugin and writes it to a KV or JSON file that is used by the translate filter in the first LS pipeline.

akapit · June 21, 2018, 8:18am

This is what u mean in ur last comment?

https://www.elastic.co/guide/en/logstash/current/pipeline-to-pipeline.html

Christian_Dahlqvist · June 21, 2018, 8:19am

I think this could be very error prone. As it is not possible to control when the translate plugin will refresh, it could end up refreshing halfway through the file being written. I would probably prefer having a script prepare the file and then just replace it when complete.

akapit · June 21, 2018, 8:20am

I hear u,.. so it looks to me so far that the JDBC Streaming option is the safer one, isn't it?

guyboertje · June 21, 2018, 8:23am

Actually, I don't think the second pipeline idea works as the file will not be written once at the end of the ES input data collection run.

akapit · June 21, 2018, 8:59am

Wait, i'm confused, so the Elasticsearch filter plugin isn't a good fit for this?

(https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html)

Does it make a request for every record imported?

akapit · June 21, 2018, 10:15am

I'm actually having this issue:

"Error: java::mongodb.jdbc.MongoDriver not loaded"

And couldn't yet find any solution.

This is my config:

    jdbc_streaming {
    jdbc_driver_library => "/Applications/UnityJDBC/mongodb_unityjdbc_full.jar"
    jdbc_driver_class => "java::mongodb.jdbc.MongoDriver"
    jdbc_connection_string => "jdbc:mongodb://localhost:27018/mydb"
    statement => "select value from app_values where setting = 'rate'"
    target => "rate"
    add_field => {"yeah" => "This is my rate: %{rate}"}
}

Any direction?

guyboertje · June 21, 2018, 10:50am

jdbc_driver_class => "java::mongodb.jdbc.MongoDriver" is wrong.

Have a look at http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php

akapit · June 21, 2018, 11:10am

You're right, however i've tried also what they wrote in theirs documentation.
Also

jdbc_connection_string => "jdbc:mongodb://localhost:27018/mydb"

Wasn't right, but also tried with theirs "version" and I get the same error:

Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Sequel::AdapterNotFound: 
java::mongodb.jdbc.MongoDriver not loaded>, :backtrace=> 
["/Users/akapit/workspace/outflink/outflink_platform/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/sequel-5.7.1/lib/sequel/adapters/jdbc.rb:44:in `load_driver'",

@Anybody here made this work?

Thanks

guyboertje · June 21, 2018, 4:31pm

Well, it is true that we don't have an adapter for MongoDB. AFAIR, when a specific adapter is not found a generic one is used. You may have trouble with the generic adapter because it does vanilla SQL only.

Is this what you are trying?

    jdbc_driver_class => "mongodb.jdbc.MongoDriver"
    jdbc_connection_string => "jdbc:mongo://localhost:27018/mydb"

The above is from the link I gave.

akapit · June 26, 2018, 11:34am

Yes, that's what i'm trying

guyboertje · June 26, 2018, 11:36am

Post your config please

Topic		Replies	Views
What's the best way to add some additional information from Logstash to Elasticsearch Logstash	2	380	February 19, 2017
NOT ABLE TO INDEX DATA USING HTTP POLLER Logstash	13	208	March 14, 2024
Parsing Logstash Data Logstash	6	743	September 27, 2017
How to pass ruby variable to logstash field? Logstash	5	2967	April 27, 2017
Best way to develop ruby scripts for logstash? Logstash	2	773	July 6, 2017

Ruby filter - How to access some elastic index data from a ruby filter

Related topics