JSON Input vs. Scripted Fields

Hello,

I have a requirement to get the average of the sum of two fields in all document and visualize it, something like this:

avg("field 1" + "field 2")

I have found two ways to do this and curious what the difference is both in terms of "best practices" or performance:

Method 1: Create a "scripted field" for the index which would be the SUM of the two required fields stored into a new field (called sum_field1_field2) and get the average of "sum_field1_field2 in metric.

Method 2: Create a JSON input in metric as follows:
{
"script": {
"inline": "doc['field1'].value + doc['field2'].value",
"lang": "painless"
}
}

From what I have read, Method 1 will be queried in real-time when the request is made so it may be slow to display the visualizations on a large data set. Is the same true for method 2?

On the other hand I find Method 2 a bit confusing because I would be required to pick an arbitrary field to aggregate against in metric (for example "field 3"), even though it doesn't really matter and only the script will determine the result.

Just curious if someone has any further input on this. Thanks in advance.

The two approaches are the same thing as far as Elasticsearch is concerned - both calculate the script value on the fly when the request is made. A "scripted field" in Kibana is just a configuration which causes Kibana to put your script definition into every query it sends to Elasticsearch.

For example consider the following scripted field configured in the index pattern:

If you go to Discover (or Visualize, it works the same there for that matter), you can inspect the queries Kibana is sending by clicking the "Inspect" button in the top nav. Your configured script will show up there - it's simply sent with every request made.

By using the JSON input, you are doing the same thing - embedding your script into the request made for the Visualization. In both cases you have the same potential performance problem because the script might need to run a lot of times in Elasticsearch if your query hits a lot of documents.

Best practice is definitely to avoid scripting altogether if possible. I always see it as a tool to rapidly prototype how your index should be shaped (maybe on a test server). Once you know what you need you can move this kind of pre-processing to the ingest pipeline and calculate it a single time per document there for optimal performance for production workloads. Of course that's not true for all cases, if you are just dealing with small data sets performance might not be relevant for you - it's a flexibility/performance trade-off you can tune for your specific use case.

On the other hand I find Method 2 a bit confusing because I would be required to pick an arbitrary field to aggregate against in metric (for example "field 3"), even though it doesn't really matter and only the script will determine the result.

It's a bit hacky, but can be nice for cases where you need a scripted field just in a single place. By putting it into the visualization definition you can avoid polluting the index pattern with lots and lots of single-purpose scripted fields (which would show up everywhere - in Discover, Filter dropdowns and so on).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.