Attempt to divide two fields at the aggregate level causing infinity response

LuigiGalati · February 3, 2022, 1:48pm

I am feeding kibana with an HR dataset from peoplesoft. The data is beginning, end and termination counts by fiscal year. There is no timestamped data used. I need to calculate percent turnover which is Terminations/(Average of Beginning and End Counts). All of my attempts to put the beg and end counts in the denominator seem to be resulting in infinity which leads me to believe that i am not dividing at the aggregate level but at the row level.

I have made three different indexes:

Employee and a single field that has a B, E, T for the type of count the employee is being used
Employee and a field for each of the B, E, T counts with a numerical 1 or 0
Sum of employee B, E, T counts from (2) above.

For the last index, this is my table data:

The avg population column is a scripted field that correctly calculates the average of Beginning and ending counts for any other demographic field i insert (in this case Gender/Sex) which is exactly as i need it.

I have attempted to use some conditional logic to set to zero any calculations that divide by zero but this doesn't work, the row calculations are still being done and not the aggregate calculations.

I don't believe i can use timelion as this is not timestamped data being visualized.

Any thoughts for how i could correctly divide by the aggregated average counts of two other fields or in the case of a single field with B's, E's, and T's in it, can I divide the count of T's by the average of counts of B's and E's?

Tomo_M · February 3, 2022, 10:03pm

I suppose we need to elaborate the problem a little bit more.

What is the most finer data you have? Was the data for three indices made by client side application?
And what is your final goal. To show in table or do you want some visualization?

If you have some failed scripts, sharing the script with its context could be help to answer the problem.

LuigiGalati · February 3, 2022, 10:30pm

I have data at the employee level with either a B, E, or T in a field labeled TYPE (this field indicates if the row with that employee is in the BEGINNING count, the END count or the TERMINATIONS count). or i have the same level of detail but that field called TYPE is now broken out into a BEGINNING field, an END field and a TERMINATIONS field with either a 0 or a 1 value. The ultimate goal is to visualize the data in a line graph. The fiscal year on the X axis, turnover percentages on the Y axis and the ability to put any of the subgroupings (demographic data like sex, salary band, etc) within the graph at will.

Here is my attempt at a JSON script which utilizes the AVG_POP scripted field for the index:

{
 "script":
 "if (avg_pop== 0) 
{return 0} 
else 
{return (doc['NUMBER3_ORCL_ES_ENG'].sum())/(avg_pop) }"
 }

But this produces infinity in the response.

Tomo_M · February 4, 2022, 12:02am

Can you share the response itself and the problematic document?

LuigiGalati · February 4, 2022, 12:32am

Sure. So without the conditional, with a JSON script as follow:

{
"script":
"(doc['NUMBER3_ORCL_ES_ENG'].sum())/((doc['NUMBER1_ORCL_ES_ENG'].sum()+doc['NUMBER2_ORCL_ES_ENG'].sum())/2) "
}

I get the following output and then response. Note the INFINITY even though the aggregate values are nonzero.

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 10134,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "4": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "1": {
            "value": 7841
          },
          "5": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 5351
                },
                "2": {
                  "value": 5493
                },
                "3": {
                  "value": 737
                },
                "6": {
                  "value": 5422
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "f",
                "doc_count": 1423
              },
              {
                "1": {
                  "value": 2490
                },
                "2": {
                  "value": 2499
                },
                "3": {
                  "value": 341
                },
                "6": {
                  "value": 2494.5
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "m",
                "doc_count": 1183
              }
            ]
          },
          "key": 2021,
          "doc_count": 2606
        },
        {
          "1": {
            "value": 7625
          },
          "5": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 5205
                },
                "2": {
                  "value": 5348
                },
                "3": {
                  "value": 574
                },
                "6": {
                  "value": 5276.5
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "f",
                "doc_count": 1422
              },
              {
                "1": {
                  "value": 2420
                },
                "2": {
                  "value": 2486
                },
                "3": {
                  "value": 254
                },
                "6": {
                  "value": 2453
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "m",
                "doc_count": 1161
              }
            ]
          },
          "key": 2020,
          "doc_count": 2583
        },
        {
          "1": {
            "value": 7317
          },
          "5": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 4942
                },
                "2": {
                  "value": 5207
                },
                "3": {
                  "value": 607
                },
                "6": {
                  "value": 5074.5
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "f",
                "doc_count": 1422
              },
              {
                "1": {
                  "value": 2375
                },
                "2": {
                  "value": 2420
                },
                "3": {
                  "value": 306
                },
                "6": {
                  "value": 2397.5
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "m",
                "doc_count": 1138
              }
            ]
          },
          "key": 2019,
          "doc_count": 2560
        },
        {
          "1": {
            "value": 6977
          },
          "5": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 4684
                },
                "2": {
                  "value": 4946
                },
                "3": {
                  "value": 561
                },
                "6": {
                  "value": 4815
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "f",
                "doc_count": 1306
              },
              {
                "1": {
                  "value": 2293
                },
                "2": {
                  "value": 2382
                },
                "3": {
                  "value": 294
                },
                "6": {
                  "value": 2337.5
                },
                "7": {
                  "value": "Infinity"
                },
                "key": "m",
                "doc_count": 1079
              }
            ]
          },
          "key": 2018,
          "doc_count": 2385
        }
      ]
    }
  },
  "status": 200
}```

Tomo_M · February 4, 2022, 5:15am

Please share sample document and whole query if possible.

LuigiGalati · February 4, 2022, 2:08pm

Sorry about that. You're asking for the sample document and the request, yes? Here is an example where the Number1 and Number2 fields are both zero. When those are in the denominator of my scripted/json field i'll get infinity. But i only want to do the division after it's been summed, and that should never be zero as there are other documents which will have values in one or both of number1 and number 2 fields to roll up to a nonzero sum.

{
  "aggs": {
    "4": {
      "terms": {
        "field": "Fiscal Year",
        "size": 5,
        "order": {
          "1": "desc"
        }
      },
      "aggs": {
        "1": {
          "sum": {
            "field": "NUMBER1_ORCL_ES_ENG"
          }
        },
        "5": {
          "terms": {
            "field": "SEX_ORCL_ES_KW_ENG",
            "size": 5,
            "order": {
              "1": "desc"
            }
          },
          "aggs": {
            "1": {
              "sum": {
                "field": "NUMBER1_ORCL_ES_ENG"
              }
            },
            "2": {
              "sum": {
                "field": "NUMBER2_ORCL_ES_ENG"
              }
            },
            "3": {
              "sum": {
                "field": "NUMBER3_ORCL_ES_ENG"
              }
            },
            "6": {
              "sum": {
                "script": {
                  "source": "(doc['NUMBER1_ORCL_ES_ENG'].sum()+doc['NUMBER2_ORCL_ES_ENG'].sum())/2",
                  "lang": "painless"
                }
              }
            },
            "7": {
              "sum": {
                "script": "(doc['NUMBER3_ORCL_ES_ENG'].sum())/((doc['NUMBER1_ORCL_ES_ENG'].sum()+doc['NUMBER2_ORCL_ES_ENG'].sum())/2) "
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "_source": {
    "excludes": []
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {
    "Avg_Pop": {
      "script": {
        "source": "(doc['NUMBER1_ORCL_ES_ENG'].sum()+doc['NUMBER2_ORCL_ES_ENG'].sum())/2",
        "lang": "painless"
      }
    }
  },
  "docvalue_fields": [
    {
      "field": "LASTUPDDTTM_ORCL_ES_ENG",
      "format": "date_time"
    },
    {
      "field": "LASTUPDDTTM_ORCL_ES_KW_ENG",
      "format": "date_time"
    },
    {
      "field": "Last Modified Date Time",
      "format": "date_time"
    },
    {
      "field": "Last Update Date/Time",
      "format": "date_time"
    },
    {
      "field": "ORCL_ES_LAST_MODIFIED_DATE",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "match_all": {}
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Tomo_M · February 4, 2022, 4:05pm

Thank you! I understood the problem.

The problem is that the "script" under "sum" aggregation is just applied for each document. It works as a script field for the document. Finally the values of all documents were aggregated to the overall sum. So, the .sum() of doc['<field>'].sum() works for just array field and have no meanings for single value field.

I'm not sure why script with if condition did not work, but even if it works, I believe this script would not return your intention.

If it is Elasticsearch and not for kibana Aggregation-based Data Table, you may use bucket script aggregation to calculate over other aggregated metrics. However, bucket script aggregation is not supported in Aggregation-based Data Table and I have no idea about it.

I you are using recent kibana, you can use formula in Lens > Data Table. Using Formula, It is surprizingly easy to create this kind of data table.

LuigiGalati · February 4, 2022, 4:38pm

Thanks so much for your help. I don't know if i have Lens available to me, here are the functions available. Do you know what version Lens is on and also how do i see what version i'm on?:

LuigiGalati · February 4, 2022, 4:44pm

Sorry i just realized that LENS is a visualization type. I do not see it there. How can i see what version of Kibana I am on? Also, do you know if Lens is a visualization type that I can request from the developers at my company?

Again, thanks so much.

Tomo_M · February 4, 2022, 4:51pm

Lens is a little bit older, but Formula is released at 7.14. It's a quite new feature.
You can check your version from Stack Management or right-top circle button.

If you want to use Lens Formula, update kibana is the only way I think.

LuigiGalati · February 4, 2022, 5:50pm

So I'm currently on version 7.0.0 and i think it would be an extremely heavy lift (if not time-consuming) to get to the version that might have formula or lens in it.

That said, i think i can get to the numbers that i want via a bucket aggregation but i'm just not sure how to do that and then visualize it. I've been able to get close with the DEV TOOLS console but i don't know how to turn that into a visualization.

I'll leave the thread open until i can get someone with some bucket aggregation script experience to help out. Thanks so much, i definitely could have used Lens and Formula if i was on the right version.

Are you able to visualize Formula into something like a line graph?

Tomo_M · February 4, 2022, 10:53pm

It is possible by using customized visualization which use vega-lite. It needs some coding and need some effort to get familiar with it.

Of course you can use other types of chart in lenz. If you want to get line chat you have to convert the fiscal year to integer or timestamp.

Tomo_M · February 5, 2022, 10:56am

I come up with an easier way.

Index the fiscal year as timestamp and then use TSVB. Set the time interval for 1y and use special math aggregation which can calculate some formulat using results of other aggregations. It is the same as bucket script aggregation. Maybe this is the easier way for you than using vega-lite.

LuigiGalati · February 5, 2022, 4:33pm

So this is really great. I'm actually able to access that visualization. However, when i use the Time Series option, i'm getting data failed to load and i can't find where the details of the error might be:

LuigiGalati · February 5, 2022, 4:39pm

Tomo_M · February 5, 2022, 5:25pm

Maybe the error was caused by no datetime fields are specified for timestamp. That's what I said "index the fiscal year as timestamp".

(1) You may create another field and anyway (possibly using ingest pipline) convert 2019 to Jan 1, 2019 @ 00:00:00.000Z or something like that.

(2) Or add runtime field to the index pattern (this method may need recent kibana..).

int year = Integer.parseInt(doc['FISCAL_YEAR_ORCL_ES_ENG'].value.toString());
ZonedDateTime zdt = ZonedDateTime.of(year, 1,1,0,0,0,0,ZoneId.of('Z'));
emit(zdt.toInstant().toEpochMilli())

LuigiGalati · February 5, 2022, 6:10pm

I had done all of that (i used Peoplesoft's query to convert the fiscal year into a datetime field using the following:
to_timestamp(trunc(to_Date(B.FISCAL_YEAR,'YYYY'),'YEAR'))

I was calling the field LASTUPDDTTM_ORCL_ES_ENG and as you can see from the document, it is a timestamp that is the first datetime of the fiscal year. I was thinking the existence of the datetimestamp was the problem as well but it's not. Interestingly enough i'm able to see some data on the TABLE tab of VISUAL BUILDER, just not the TIME SERIES tab. Here's another example with a different index. This one is delivered with the peoplesoft product so it should work (Index: kibana_sample_logs):

and here is the time series tab with the same error that i am seeing for the index that i created by hand:

Tomo_M · February 5, 2022, 6:58pm

What will happen if broaden the time filter on the right top to ten years?

LuigiGalati · February 5, 2022, 8:00pm

It does not seem to matter what i choose for the time filter on the top right (although for some reason only Jan 1 2021 data is showing up in the table):

When i "discover" the data there is data for all four "fiscal years" (which is the LASTUPDDTTM field):

Topic		Replies	Views
Subtract two fields in a Timelion Query Kibana	2	2940	August 8, 2017
Dividing Value of Chart Kibana	3	3327	December 21, 2017
Calculus between two fields Kibana	5	2254	January 7, 2020
SQL Dividing 2 values from 2 queries in CANVAS Kibana canvas	5	3279	April 30, 2020
Kibana Canvas , Calculate percentage using to aggregations Kibana canvas	5	3629	August 17, 2019

Attempt to divide two fields at the aggregate level causing infinity response

Related topics