Query Relational DB based on a field value in Logstash?

I can parse the log files using logstash. Now, I want to process further one of the fields just parsed. I want to gather additional information for that field. But the information is stored in a relational database. Is it possible to merge data from two sources in one go?

Performing a lookup in a relational database per record is not possible and would be very slow. What you however can do is create a lookup file based on the data in your relational database and use the translate plugin to enrich your events based on this data.

Hi Christian_Dahlqvist,

Thank you! This is very helpful. I tried to look for some references on how I could manipulate/enrich data using dictionary and transform. But I could not find much information. Could you please help me with some pointers on how I access dictionary file elements in logstash configutation file?

Sorry for the hassle.

Best regards,

You can create a dictionary file based on the content of your relational database. This can be a simple file containing key-value pairs. The values can be strings representing single values or even a set of CSV fields or a JSON document. I have created a simple example based on the following file named 'translate.data':

"key1": "1,2,3"
"key2": "4,5,6"

The following simple configuration file shows how it can be used.

input {
  stdin {}
}

filter {
  translate {
    field => "message"
    destination => "[@metadata][csv]"
    dictionary_path => "./translate.data"
  }

  csv {
    source => "[@metadata][csv]"
    columns => ["a","b","c"]
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

If I now run the following command echo key2 | ./bin/logstash -f ./translate.conf I get output that looks something like this, where the data has been retrieved based on the message field and the value parsed as CSV.

{
       "message" => "key2",
      "@version" => "1",
    "@timestamp" => "2016-02-22T12:00:54.747Z",
          "host" => "localhost",
             "a" => "4",
             "b" => "5",
             "c" => "6"
}
1 Like

Hi Christian,

Thank you so much! Yes, I could install the plugin, create a test csv file, but I keep getting the bad syntax error. I know this is like asking for too much...
this is what a sample record looks like
"q12345": "This is a test, column2, test column3, 1234, /abc/def/ghi/jkl.htm, some dummy value"

If I can get the syntax right, I might be able to echo the value at the prompt. But when I am inside a logstash.conf file, parsing a log file, reach the path column like /ab/def/ghi, how does that call work? Calling the other conf file from within a conf file using the field just accessed?, I want to add q12345, where I come across '/def/' in the path column...

Thank you for your patience and guidance.

Best regards,

You need to parse the log file so that you get the key you will use for the lookup ('key2' in my simple example) separated in a field and then use this to load the data you want to enrich with from the file using the translate plugin. I do not understand your example and what you are trying to do.

I have a log file I am parsing for log analysis. I am using logstash to do that. The grok filter splits the data as I need it. I am splitting the path field and dynamically adding a new column for each pathlevel. So, if I have a record like 'someid', '/some/value/totest', it would be split into 4 columns, id, pathlevel1, pathlevel2, pathlevel3. But for the data to make sense, it must be enriched with the values that we have in a relational DB. So, when I get a field value using grok, I want to use that value and get the additional values from DB, for which you suggested I use translate plugin. So, when I come across the default path column value using grok, how do I make a call to the dictionary file to fetch related values for that column value that is being processed at the moment?

Best regards,

If I understand you correctly, you want to enrich based on a variable number of fields, depending on the number of components in the path? I am not sure this can be done using the translate plugin.

I am splitting the path for my convenience to project the data in Kibana by path level. But yes, I do need to enrich the data based on a substring in the path, /some/value/totest. I need to extract /value/ and use that to query the dictionary for additional information.

Best regards,

Sorry to bug you again!

A quick query, if I may -

I cannot get the fields from the dictionary as you suggested. I created a file in the same csv format as suggested, for example
"q12345": "This, is, a test, /record/dummy, 12345"

yet, I do not see the field values in the output. The values end after host. In the example I shared, the filter part is like
csv {
source => "[@metadata][csv]"
columns => ["col1","col2","col3", "col4", "col5","col6"]
}

I am not sure what I could be missing. If there is someplace else I could direct my query to, I'd be happy to do that.

Best regards,

Wheat does your translate filter config look like? Which field holds the key (q12345) that you are looking up?

This is what my test filter config file looks like

input {
stdin{}
}

filter {
translate {
field => "message"
destination => "[@metadata][csv]"
dictionary_path => "C:/logstash-2.1.1/bin/db_test.csv"
}

csv {
source => "[@metadata][csv]"
columns => ["PRODUCT_NAME","CATEGORY","PAGE","ALIAS"]
}
}

output {
stdout {
codec => rubydebug
}
}

Please note that #### there is no mapping for the dummyID column in the config file.
The data in the csv follows the format -

"dummyid1": "Prod_name1_here","category1_here","page1_here","alias1_here"
"dummyid2": "Prod_name2_here","category2_here","page2_here","alias2_here"

Best regards,

Hi Ravinder,

I came across the same issue as yours and I solved it using the below method.

Logstash Translate filter :

translate {
field => "card_bin"
dictionary_path => "/opt/bin_mapping_1.yaml"
exact => true
}

where "card_bin" acts as my key in YAML
Its corresponding value will be populated in the destination field which is "translation" by default.
YAML file sample -

"481508": HDFC
"489519": HDFC
"546094": HDFC
"478024": HDFC

  • Integers need to be in quotes in YAML config.
    And it takes indentation very seriously! (Notice the space (' ') after ':')