How To Add a New Field, Then Populate It Based On Another Field's Value?

Hello Elasticsearch Sorcerers,

I am running Logstash v7.4.0 and Elasticsearch v7.4.0 in a nicely-working pipeline. My chain of fruit stores are sending my sales information to Logstash; Logstash then pushes that data to Elasticsearch. Here’s a subsample of my Elasticsearch fruit_sales index, plus a few example data records:

product_code   qnty
====================
100            2039
200             382
300            1028
400             128

This means: I sold 2,039 units of product 100, 382 units of product 200, and so on.

My setup works great, but the trouble is it’s a pain to remember what Product Code is assigned for which fruit. On a piece of paper in my office, I have a list:

product_code   name
======================
100            apples
200            oranges
300            bananas

The list currently has several hundred items. It does not change much, but I might need to modify it from time to time. Unfortunately, there is no way to modify the data that my stores send into Logstash to include these names.

What I’d love is a solution where I can input my product_code-to-name list into ElasticSearch. And then, when ES is processing the data from Logstash, ES adds a new string field into each data record:

product_code   *name*       qnty
================================
100            “apples”     2039
200            “oranges”     382
300            “bananas”    1028
400            “other”       128

Now in my index, I have both the numerical product code plus a human-readable string with the fruits’ name. This makes checking queries and reading Kibana Visualizations much, much easier. Also note that there is a default value (“other”) in case I have a product which appears in the sales data, but not on the name list.

Another consideration: I don’t think I’ll need to update the product_code-to-name list much, but I can’t have a solution where I would need to stop and restart the ES service whenever I make an edit to that list. Of course, if I have 10,000 data records with a Name field already populated, and then, say, I change Product_Code 100 from “apples” to “papayas,” there’s no need to traverse through all existing records to change that. But every new data record from that point out needs to have the new string assignment.

Is there a graceful, non-computational way to do this? In SQL, you could do a join between two tables, in C you could create a product_code-to-name hash table, etc.

In ES, I assume I would use the “put mapping” API to create the “name” field in my index, and I could create it as a string plus set all initial values to “other” to enforce the default value. But I’m baffled how I could do the product_code-to-name lookup…?

Any advice on this will be wildly appreciated.

FULL DISCLOSURE: I am also posting a variation of this question to the Logstash forum, as this task might be easier on LS than ES.

For now, Logstash is indeed the better tool to do this. However, work is underway to allow you do this in Elasticsearch soon, using the new enrich processor that you can use with an enrich policy. The master branch of the documentation will give you a sneak peak of how this will work.

You are correct. I was able to solve this in Logstash; for anyone reading this post, see here.

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.