Jdbc_static left joins in local_lookups

I have connected to a database via logstash filter jdbc_static and input jdbc
I have loaded the tables and defined the local_db_objects objects.

I have created a query in the local_lookups and when I just join tables together everything is fine and data is flowing. But when I change it to a left join then logstash stalls and nothing happens for minutes

    local_lookups => [ 
        {
        id => "rawlogfile"
        query => "
            SELECT
            l_AE.datetime_
            , l_AP.role_
            , l_AP.reference_
            , l_AE.type_
            , l_Org.name_
            , l_AP.code_
            , l_AE.myUnqualifiedId
            FROM  l_AE 
            JOIN l_AP ON l_AP.myElementSpecificId_AuditEventFHIR = l_AE.myUnqualifiedId AND l_AP.myUnqualifiedVersionId_AuditEventFHIR = l_AE.unqualifiedversionid__
            JOIN l_Org ON l_AP.reference_ = l_Org.myUnqualifiedId
            "  
        target => "sql_output"
        }
    ]

In the terminal I see the following

[2019-08-06T09:39:13,865][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.8.0"}
[2019-08-06T09:39:23,256][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-08-06T09:39:23,287][INFO ][logstash.filters.jdbcstatic] derby.system.home is: C:\Users\N1XERW
[2019-08-06T09:39:27,147][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR, fetched 186311 records in: 2.391 seconds
[2019-08-06T09:39:29,960][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR, saved fetched records to import file in: 2.813 seconds
[2019-08-06T09:39:31,381][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR, imported all fetched records in: 1.421 seconds
[2019-08-06T09:39:40,495][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR_Participant, fetched 362087 records in: 9.098 seconds
[2019-08-06T09:39:43,074][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR_Participant, saved fetched records to import file in: 2.579 seconds
[2019-08-06T09:39:47,058][INFO ][logstash.filters.jdbc.readwritedatabase] loader AuditEventFHIR_Participant, imported all fetched records in: 3.984 seconds
[2019-08-06T09:39:47,152][INFO ][logstash.filters.jdbc.readwritedatabase] loader OrganizationFhir, fetched 800 records in: 0.094 seconds
[2019-08-06T09:39:47,183][INFO ][logstash.filters.jdbc.readwritedatabase] loader OrganizationFhir, saved fetched records to import file in: 0.031 seconds
[2019-08-06T09:39:47,230][INFO ][logstash.filters.jdbc.readwritedatabase] loader OrganizationFhir, imported all fetched records in: 0.047 seconds
[2019-08-06T09:39:47,339][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x79511f9e run>"}
[2019-08-06T09:39:47,386][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2019-08-06T09:39:47,777][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
C:/Logstash/logstash-6.8.0/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated
[2019-08-06T09:40:00,456][INFO ][logstash.inputs.jdbc     ] (0.002241s) SELECT * FROM   AuditEventFHIR WHERE myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'
[2019-08-06T09:41:00,394][INFO ][logstash.inputs.jdbc     ] (0.000984s) SELECT * FROM   AuditEventFHIR WHERE myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'
[2019-08-06T09:42:00,098][INFO ][logstash.inputs.jdbc     ] (0.000740s) SELECT * FROM   AuditEventFHIR WHERE myUnqualifiedId = '0000134b-fc7f-4c3a-b681-8150068d6dbb'

And I just get a new SELECT * FROM.... line every minut here after (this is the input-jdbc query) . IF i have made a left or a left outer join. With a normal join logstash works fine, but the data is wrong.

Surely you need a where clause to have the local_lookup use a value from the current event/doc to target the related subset of data?

This is an example of a test config I used to test jdbc_static...
Look at these two lines:
query => "select descr as description from servers WHERE ip = :ip"
parameters => {ip => "[from_ip]"}

input {
  generator {
    lines => [
      '{"from_ip": "10.2.3.40", "app": "from-P2", "amount": 22.95, "loggedin_userid": 101}',
      '{"from_ip": "10.2.3.20", "app": "from-P2", "amount": 22.95, "loggedin_userid": 100}',
      '{"from_ip": "10.2.3.30", "app": "from-P2", "amount": 22.95, "loggedin_userid": 101}'
    ]
    count => 1
  }
}

filter {
  json {
    source => "message"
  }

  jdbc_static {
    loaders => [
      {
        id => "servers"
        query => "select ip, descr from ref.local_ips order by ip"
        local_table => "servers"
      },
      {
        id => "users"
        query => "select firstname, lastname, userid from ref.local_users order by userid"
        local_table => "users"
      }
    ]
    local_db_objects => [
      {
        name => "servers"
        index_columns => ["ip"]
        preserve_existing => true
        columns => [
          ["ip", "varchar(15)"],
          ["descr", "varchar(255)"]
        ]
      },
      {
        name => "users"
        index_columns => ["userid"]
        columns => [
          ["firstname", "varchar(255)"],
          ["lastname", "varchar(255)"],
          ["userid", "int"]
        ]
        preserve_existing => true
      }
    ]
    local_lookups => [
      {
        query => "select descr as description from servers WHERE ip = :ip"
        parameters => {ip => "[from_ip]"}
        target => "server"
      },
      {
        query => "select firstname, lastname from users WHERE userid = :id"
        parameters => {id => "[loggedin_userid]"}
        target => "user"
      }
    ]
    add_field => { server_name => "%{[server][0][description]}" }
    add_field => { user_firstname => "%{[user][0][firstname]}" }
    add_field => { user_lastname => "%{[user][0][lastname]}" }
    remove_field => ["server", "user"]
    staging_directory => "/elastic/tmp/logstash-6.3.2/data/jdbc_static/import_data"
    loader_schedule => "*/10 * * * * * UTC"
    jdbc_user => "logstash"
    jdbc_password => "***"
    jdbc_driver_class => "org.postgresql.Driver"
    jdbc_driver_library => "/elastic/tmp/postgresql-42.1.4.jar"
    jdbc_connection_string => "jdbc:postgresql://localhost:5432/ls_test_2"
  }
}

output {
  stdout {
    codec => rubydebug {metadata => true}
  }
}

the where clause is optional. Logstash will dump the result of the local lookups into the "sql_output" field.

The setup i am running is pulling 1 record in the input, and then enriching it with query result. The server I am pulling data from, has no power at all. That is why I am pulling it to the logstash server, doing my data manipulation here, and sending it off to an ElasticSearch server

loader AuditEventFHIR, fetched 186311 records in: 2.391 seconds
loader AuditEventFHIR_Participant, fetched 362087 records in: 9.098 seconds
loader OrganizationFhir, fetched 800 records in: 0.094 seconds

How is your lookup query not tying to add some huge product of records to your single event if you are not using a WHERE clause to target the records that relate to some property of the single record you get from the input?

Thank you guyboertje, your input was valuable, but in the end I found a solution with jdbc-input

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.