Query Relational DB based on a field value in Logstash?

RaviSidhu · February 19, 2016, 8:09am

I can parse the log files using logstash. Now, I want to process further one of the fields just parsed. I want to gather additional information for that field. But the information is stored in a relational database. Is it possible to merge data from two sources in one go?

Christian_Dahlqvist · February 22, 2016, 1:35am

Performing a lookup in a relational database per record is not possible and would be very slow. What you however can do is create a lookup file based on the data in your relational database and use the translate plugin to enrich your events based on this data.

RaviSidhu · February 22, 2016, 11:45am

Hi Christian_Dahlqvist,

Thank you! This is very helpful. I tried to look for some references on how I could manipulate/enrich data using dictionary and transform. But I could not find much information. Could you please help me with some pointers on how I access dictionary file elements in logstash configutation file?

Sorry for the hassle.

Best regards,

Christian_Dahlqvist · February 22, 2016, 12:18pm

You can create a dictionary file based on the content of your relational database. This can be a simple file containing key-value pairs. The values can be strings representing single values or even a set of CSV fields or a JSON document. I have created a simple example based on the following file named 'translate.data':

"key1": "1,2,3"
"key2": "4,5,6"

The following simple configuration file shows how it can be used.

input {
  stdin {}
}

filter {
  translate {
    field => "message"
    destination => "[@metadata][csv]"
    dictionary_path => "./translate.data"
  }

  csv {
    source => "[@metadata][csv]"
    columns => ["a","b","c"]
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

If I now run the following command echo key2 | ./bin/logstash -f ./translate.conf I get output that looks something like this, where the data has been retrieved based on the message field and the value parsed as CSV.

{
       "message" => "key2",
      "@version" => "1",
    "@timestamp" => "2016-02-22T12:00:54.747Z",
          "host" => "localhost",
             "a" => "4",
             "b" => "5",
             "c" => "6"
}

RaviSidhu · February 23, 2016, 5:10am

Hi Christian,

Thank you so much! Yes, I could install the plugin, create a test csv file, but I keep getting the bad syntax error. I know this is like asking for too much...
this is what a sample record looks like
"q12345": "This is a test, column2, test column3, 1234, /abc/def/ghi/jkl.htm, some dummy value"

If I can get the syntax right, I might be able to echo the value at the prompt. But when I am inside a logstash.conf file, parsing a log file, reach the path column like /ab/def/ghi, how does that call work? Calling the other conf file from within a conf file using the field just accessed?, I want to add q12345, where I come across '/def/' in the path column...

Thank you for your patience and guidance.

Best regards,

Christian_Dahlqvist · February 23, 2016, 5:19am

You need to parse the log file so that you get the key you will use for the lookup ('key2' in my simple example) separated in a field and then use this to load the data you want to enrich with from the file using the translate plugin. I do not understand your example and what you are trying to do.

RaviSidhu · February 23, 2016, 5:29am

I have a log file I am parsing for log analysis. I am using logstash to do that. The grok filter splits the data as I need it. I am splitting the path field and dynamically adding a new column for each pathlevel. So, if I have a record like 'someid', '/some/value/totest', it would be split into 4 columns, id, pathlevel1, pathlevel2, pathlevel3. But for the data to make sense, it must be enriched with the values that we have in a relational DB. So, when I get a field value using grok, I want to use that value and get the additional values from DB, for which you suggested I use translate plugin. So, when I come across the default path column value using grok, how do I make a call to the dictionary file to fetch related values for that column value that is being processed at the moment?

Best regards,

Christian_Dahlqvist · February 23, 2016, 5:40am

If I understand you correctly, you want to enrich based on a variable number of fields, depending on the number of components in the path? I am not sure this can be done using the translate plugin.

RaviSidhu · February 23, 2016, 5:44am

I am splitting the path for my convenience to project the data in Kibana by path level. But yes, I do need to enrich the data based on a substring in the path, /some/value/totest. I need to extract /value/ and use that to query the dictionary for additional information.

Best regards,

RaviSidhu · February 23, 2016, 11:39am

Sorry to bug you again!

A quick query, if I may -

I cannot get the fields from the dictionary as you suggested. I created a file in the same csv format as suggested, for example
"q12345": "This, is, a test, /record/dummy, 12345"

yet, I do not see the field values in the output. The values end after host. In the example I shared, the filter part is like
csv {
source => "[@metadata][csv]"
columns => ["col1","col2","col3", "col4", "col5","col6"]
}

I am not sure what I could be missing. If there is someplace else I could direct my query to, I'd be happy to do that.

Best regards,

Christian_Dahlqvist · February 23, 2016, 2:51pm

Wheat does your translate filter config look like? Which field holds the key (q12345) that you are looking up?

RaviSidhu · February 24, 2016, 2:49am

This is what my test filter config file looks like

input {
stdin{}
}

filter {
translate {
field => "message"
destination => "[@metadata][csv]"
dictionary_path => "C:/logstash-2.1.1/bin/db_test.csv"
}

csv {
source => "[@metadata][csv]"
columns => ["PRODUCT_NAME","CATEGORY","PAGE","ALIAS"]
}
}

output {
stdout {
codec => rubydebug
}
}

Please note that #### there is no mapping for the dummyID column in the config file.
The data in the csv follows the format -

"dummyid1": "Prod_name1_here","category1_here","page1_here","alias1_here"
"dummyid2": "Prod_name2_here","category2_here","page2_here","alias2_here"

Best regards,

Archit_Rai_Gupta · December 22, 2016, 1:11pm

Hi Ravinder,

I came across the same issue as yours and I solved it using the below method.

Logstash Translate filter :

translate {
field => "card_bin"
dictionary_path => "/opt/bin_mapping_1.yaml"
exact => true
}

where "card_bin" acts as my key in YAML
Its corresponding value will be populated in the destination field which is "translation" by default.
YAML file sample -

"481508": HDFC
"489519": HDFC
"546094": HDFC
"478024": HDFC

Integers need to be in quotes in YAML config.
And it takes indentation very seriously! (Notice the space (' ') after ':')

Topic		Replies	Views
Query database from a filter Logstash	5	1109	July 6, 2017
Logstash configuration file in joining two csv files based on primary key Logstash	2	473	January 20, 2022
Logstash XPATH Querying XML element value based on a CSV field Value Logstash	10	2095	July 6, 2017
Parsing and combining different CSV files in logstash Logstash	12	4364	February 15, 2018
New Plugin: logstash-filter-augment - Enhance your events from external data files Logstash	10	1436	March 6, 2017

Query Relational DB based on a field value in Logstash?

Related topics