JDBC_Static Filter

I am pulling data from a database that has fields populated with numeric values. I need to perform subsequent lookups to additional tables to convert the numeric value to a named value. Since I'll be doing this frequently, I wanted to minimize the number of SQL calls which it looks like the JDBC static filter would allow me to do?

I've configured the loaders fine but I have a couple questions regarding the local_db_objects and local lookups.

  1. I don't understand the purpose of the index_columns setting, is this like the table's primary key?
  2. Do local queries have to be specified in the same call of the filter or can they be specified later. For instance, does this work?
filter {
  jdbc_static {
      loaders => [
        {
          query => "SELECT id,value FROM Contact"
          local_table => "contact_type"
        }
      ]
    }
    local_db_objects => [
      {
        name => "contact_type"
        index_columns => [id]
        columns => [
          [ "id", "SMALLINT" ],
          [ "value", "varchar(20)" ]
        ]
      }
    ]
  }
  if [contact] {
    jdbc_static {
      local_lookups => [
        {
          query => "select name from contact_type WHERE value = :contact"
          target => "contact"
        }
      ]
    }
  }
}
  1. When a local_db_object is loaded, an index is built on each of the columns in the index_columns array. You do not have to have a primary (unique) key -- the lookup can return an array of hashes.
  1. I think all the data loaded by the filter has instance scope, so data loaded in one jdbc_static_filter would not be visible in another.
  1. I'm not sure what.....most of what you said on this point means, lol. Can you dumb it to say....the level of a moderately smart chimp?
  2. Actually, I was thinking about implementation wrong, so it's fine really to just have a single scope. Though the documentation says something about being multipipeline aware or something, I haven't full read into that though.

When you use the local_db_object the filter executes the query defined by the loader option (either as a one-off at startup or possibly on a schedule). The result set from the query is loaded into an in-memory Apache Derby database. That avoids having to contact the source database for every event. If the source DB was in another data centre that could result in a delay of tens of milliseconds for each event, which would be a big performance problem.

The index_columns is used to tell the Derby database which columns should be indexed. Having an index on a column greatly speeds up lookups on that column (but not WHERE clauses that use any other columns)

If you take a look at the description of these options then there is an example of each of the options.

There are two loaders. One loads ip and descr from a table to create the "servers" table in Derby. The other loads firstname, lastname, and userid into a Derby table called "users". I suspect the "order by" clauses on these queries are micro-optimizations and you should not worry about them until you have the basic functionality working.

The local_db_objects says "servers" should be indexed by "ip", and "users" should be indexed by "userid".

Note! The local_lookups option configures two queries against the local_db_objects. The local-servers query does a lookup against the Derby "servers" table, queried by ip, which is the field that was indexed in Derby. Likewise, the local-users query does a lookup against the Derby "users" table, queried by userid, which again, is the field that was indexed.

For your local_lookups, you are trying to select a column called "name" from the Derby contact_type DB, but that only has columns called "id" and "value". If you are trying to lookup a field called contact and overwrite it with the contents of the "value" column I think that would be

  local_lookups => [
    {
      query => "select value from contact_type WHERE id = :contact"
      target => "contact"
    }
  ]

"id" matching the indexed field. I am not sure if you need

parameters => {contact => "[contact]"}

or the filter matches names by default. Did you want target => "name"? I am also not sure how the filter handles overwriting a field. It is not impossible that it will turn it into an array containing both values. You would need to experiment.

1 Like

I saw once I started the pipeline that specifying an index column would improve performance. Unfortunately, I've run into another issue that I'm trying to sort. I'm using the exact same settings and statement I used in the jdbc input that works fine.

loaders => [
  {
        query => "use DB select id,value from table"
        local_table => "contact_type"
      }

Error generated:

Exception occurred when executing loader Jdbc query count Exception occurred when executing loader Jdbc query count {:exception=>"Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near the keyword 'use'."

If I remove use DB, I get a different error

Exception occurred when executing loader Jdbc query count {:exception=>"Java::ComMicrosoftSqlserverJdbc::SQLServerException: Invalid object name 'table'."

For the sake of completeness, here's the full jdbc_static configuration

jdbc_static {
    jdbc_driver_library => "d:/Logstash/data/sqljdbc_9.2/mssql-jdbc-9.2.0.jre11.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://SERVER_FQDN:1433"
    jdbc_user => "user"
    jdbc_password => "password"
    loader_schedule => "0 */1 * * *"
    loaders => [
      {
        query => "use DB select id,value from table"
        local_table => "contact_type"
      }
    ]
    local_db_objects => [
      {
        name => "contact_type"
        columns => [
          [ "id", "SMALLINT" ],
          [ "value", "varchar(20)" ]
        ]
      }
    local_lookups => [
      {
        query => "SELECT value FROM contact_type WHERE id = :id"
        parameters => { "id" => "[contacttype]"}
        target => "contacttype"
      }
    }
  ]
}

I think you have to specify the database name in the connection string.

Unfortunately, it doesn't like that either.

LogStash::Filters::Jdbc::ConnectionJdbcException: Java::ComMicrosoftSqlserverJdbc::SQLServerException: The port number 1433/DB is not valid.

Shouldn't that be a semicolon?

jdbc:sqlserver://localhost:1433;databaseName=AdventureWorks

woot! That was the issue, it's now working.

One last issue with JDBC that I see. The returned value from the JDBC local query is an object, I was expecting a value. So an event with the field contacttype comes into Logstash with a value of 3. The JDBC filter is putting the following in the elasticsearch output.

{
  value=phone
}

Again, here's my config:

    loaders => [
      {
        query => "select id,value from Table"
        local_table => "contact_type"
      }
    ]
    local_db_objects => [
      {
        name => "contact_type"
        index_columns => [ "id" ]
        columns => [
          [ "id", "int" ],
          [ "value", "varchar(20)" ]
        ]
      }
    ]
    local_lookups => [
      {
        query => "SELECT value FROM contact_type WHERE id = :id"
        parameters => { "id" => "[contacttype]"}
        target => "contacttype"
        default_hash => {
          "contacttype" => "null"
        }
      }
    ]
  }

I think that is working as expected. In the description of lookups it says "Any rows are converted to Hash objects and are stored in a target field that is an Array".

If you read through the full example in that section the lookup

local_lookups => [
  {
    query => "select descr as description from servers WHERE ip = :ip"
    parameters => {ip => "[from_ip]"}
    target => "server"
  }
]

results in an array of hashes

    "server" => [
    [0] {
        "description" => "Payroll Server"
    }
],

It pretty much has to do that since you can SELECT multiple columns and/or multiple rows. I expect nearly everyone is selecting a single row of a single column so

   "server" =>  "Payroll Server"

would work better for them, but if the [server] field were a string on some events, an array on others, and a hash on others then elasticsearch would drop documents with a mapping exception.

Ya I saw that and it makes sense. I'm using this more like you'd use a translate filter. I can't use the translate in this case because some of the fields I am doing this lookup on may have additional values added at any time.

Thanks for all the help @Badger

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.