LogStash:: How To Add a New Field, Then Populate It Based On Another Field's Value?

Hello Logstash Sorcerers,

I am running Logstash v7.4.0 and Elasticsearch v7.4.0 in a nicely-working pipeline. My chain of fruit stores are sending my sales information to Logstash; Logstash then pushes that data to Elasticsearch. Here’s a subsample of my Elasticsearch fruit_sales index, plus a few example data records:

product_code   qnty
====================
100            2039
200             382
300            1028
400             128

This means: I sold 2,039 units of product 100, 382 units of product 200, and so on.

My setup works great, but the trouble is it’s a pain to remember what Product Code is assigned for which fruit. On a piece of paper in my office, I have a list:

product_code   name
======================
100            apples
200            oranges
300            bananas

The list currently has several hundred items. It does not change much, but I might need to modify it from time to time. Unfortunately, there is no way to modify the data that my stores send into Logstash to include these names.

What I’d love is a solution where I can input my product_code-to-name list into Logstash. And then, when LS is processing the raw data it is receiving from my stores, LS adds a new string field into each data record:

product_code   *name*      qnty
===============================
100            “apples”    2039
200            “oranges”    382
300            “bananas”   1028
400            “other”      128

Now in each data record, I have both the numerical product code plus a human-readable string with the fruits’ name. This makes checking queries and reading Kibana Visualizations much, much easier. Also note that there is a default value (“other”) in case I have a product which appears in the sales data, but not on the name list.

Another consideration: I don’t think I’ll need to update the product_code-to-name list much, but I can’t have a solution where I would need to stop and restart the LS service whenever I make an edit to that list. Of course, if I have 10,000 data records with a Name field already populated, and then, say, I change Product_Code 100 from “apples” to “papayas,” there’s no need to traverse through all existing records to change that. But every new data record from that point out needs to have the new string assignment.

Is there a graceful, non-computational way to do this? In SQL, you could do a join between two tables, in C you could create a product_code-to-name hash table, etc.

I assume in my LS config file, I would set up a filter that would do something crude like this:

filter {
  ruby {
    # Bounce each data record to a ruby script which does
    # product_code lookup and adds new field accordingly
    path => "/home/me/addNameField.rb"
  }
}

I don’t have much Ruby experience, but this doesn’t seem that hard.

But I’d like to ask the Forum: (A) is this the best solution? And (B), if I use the Ruby script, is there an example of another Ruby script which adds a new field into a record? I’d rather adapt something that works than reinvent the wheel from scratch.

Any advice on this will be wildly appreciated. Thank you!

FULL DISCLOSURE: I am also posting a variation of this question to the Elasticsearch forum, as this task might be easier on ES than LS.

Sounds like a job for a translate filter. You would have a CSV file containing the mapping of product codes to product names. The filter will re-read the file if it changes.

Wow, thanks Badger,

I just took a crash course on Logstash translate filters, and I think you're right. I have to be careful before I modify a production system, however; do you mind glancing at my solution before I wire it into my LS's config file? This might be a good example for other new people like me, looking to do this in the future.

Again, the objective here is this:

For every data record that Logstash inputs:
   >  Get the value of the "product_code" field
   >  Consult a dictionary to figure out which string corresponds to that product
      code.  (If the dictionary doesn't have that code, the string should be
      "unknown")
   >  Create a new field within the data record called "name"; insert the string
      from the last step there.

Here's my "external dictionary" solution:

   filter {
      translate {
        field => "product_code"
        destination => "name"
        dictionary_path => "/home/me/myNameList.yaml"
        fallback => "unknown"
        refresh_interval => 60
        refresh_behaviour => replace
      }
    }

...where...

/home/me/myNameList.yaml
----------------------------------------------------
    "100": apples
    "200": oranges
    "300": bananas

Or, alternately:

filter {
  translate {
    field => "product_code"
    destination => "name"
    dictionary => {
      "100" => "apples"
      "200" => "oranges"
      "300" => "bananas"
    }
    fallback => "unknown"
    refresh_interval => 60
    refresh_behaviour => replace
  }
}

This looks right on paper... but I dunno... What do you think?

Thanks!

Oh... I also forgot... I need to figure out how to install the Translate and Changelog plug-ins first...? I'm a little confused by the docs. Thanks.

translate is installed by default.

Is product_code a string or integer?

product_code is an Integer

OK, then your external dictionary solution looks fine to me.

Awesome, thanks Badger. My systems are in an automatic backup at the moment, but I will implement once I get a chance. Will update this post so anyone following in my footsteps can benefit. Thanks!

Hi Badger,

It took a few days for my systems to recover from maintenance, but I finally implemented the Logstash filter, as described above. I then looked at my sales data in my Kibana portal, which is how I monitor everything. (I didn't mention before: my Logstash container is one node in a Logstash-Elasticsearch-Kibana pipeline.)

The bad news is I do not see a new data field named "name" when I inspect/explore data in Kibana. The good news is all the data I was seeing before the change is still there, so implementing the translate filter didn't break anything.

But ultimately I need to see the "name" field in Kibana. I know this is the LS forum, and I don't want to turn this post into a Kibana question. But I need to know if my LS translate filter is working properly. Can you recommend a log I could check or is there a way to see LS's raw Output to see if the "name" field is there? If I can determine that LS is correctly adding my new "name" field, I can troubleshoot the other nodes from there.

Thank you!

PS :: Sorry for the delay in this post, sometimes we can't troubleshoot on production systems as quickly as we'd like.

I would normally use

output { stdout { codec => rubydebug } }

Alternatively, use a file output with a rubydebug codec (or json, if you prefer).

Ah, of course! I'm embarrassed I didn't think of using output to inspect the output. For anyone who might be following this thread, here's the bit I put into the "filter" section of my config file:

output {
  file {
    codec => rubydebug { }
    path => "/usr/share/logstash/config/MYLOG.log"
  }
}

Okay, so having inspected the raw output, I see that my new "name" data field has not been added. Evidently, I've misconfigured something in my filter section. Here it is again:

filter {
  translate {
    field => "product_code"
    destination => "name"
    dictionary_path => "/home/me/myNameList.yaml"
    fallback => "unknown"
    refresh_interval => 60
    refresh_behaviour => replace
  }
}

Looks fine to me, but this is the first time I've done this. There must be something I've screwed up configuration-wise for this not to work, however. Is there a system log that might tell me why the filter translate is failing?

Also, I didn't mention this before, but my data is structured. I don't think it will matter if LS appends a "name" field on the end of each data record, however. But maybe the nature of the data itself might be a factor here...? Dunno, just grasping here.

Thanks!

If the product_code field exists at the top level and the name field does not exist then a translate filter with the fallback option set should always set the name field.

Thanks Badger,

Yeah, in hindsight, maybe I should have provided more information. Whenever I try to post on the forums, I try to summarize my story, so it easy for readers to focus on my issue.

So my data is structured like this:

{
  "@timestamp" => 2019-11-06T20:39:34.706Z,
  "A" => {
    "AA" => {
      "product_code" => "100",
      ...other stuff...
    },
  }
}

And my actual translate filter is this:

filter {
  translate {
    field => "A.AA.product_code"      <<===========
    destination => "name"
    dictionary_path => "/home/me/myNameList.yaml"
    fallback => "unknown"
    refresh_interval => 60
    refresh_behaviour => replace
  }
}

And in the end, I would be happy to get this:

{
  "@timestamp" => 2019-11-06T20:39:34.706Z,
  "A" => {
    "AA" => {
      "product_code" => "100",
      ...other stuff...
    },
  }
  "name" => "apples"
}

But the filter I have in place isn't working. Is that because the filter can't find "A.AA.product_code"? In other words, this is a format thing and I'm not representing "A.AA.product_code" correctly? Thanks!

Correct. In logstash you would refer to that nested field as [A][AA][product_code]. It does not use the same syntax as Kibana.

Yes! You are right! That worked! I can see my new field in Logstash's output, plus in Kibana too. I'm so good, this is most excellent indeed! Thank you!

I will repost the final solution below, for the benefit of anyone who may be reading this post.

So to add a "name" field into my data based on the value of A.AA.product_code (where the data's structure is as described above), I modified my LS config file with the following:

filter {
  translate {
    field => "[A][AA][product_code]"
    destination => "name"
    dictionary_path => "/home/me/myNameList.yaml"
    fallback => "unknown"
    refresh_interval => 60
    refresh_behaviour => replace
  }
}

And where the /home/me/myNameList.yaml looks like this:

"100": apples
"200": oranges
"300": bananas

This instructs LG to examine every value of A.AA.product_code and do its best to match that value against the strings listed in the myNameList.yaml file. If no value can be found, LS populates the new "name" field with the string "unknown"

Thanks again Badger! You rock!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.