Create custom metric for Y-axis, dealing with no existed values

lsouvleros · October 25, 2017, 10:58am

Hello,
I just started to use ELK and I want your help with issues.
I have a dataset with the following fields
Username,Name,Recruitment_date,Recruitment_year,duration_at_company,leave_date, leave_year

and I want to find the number of employees for each year. So we want
for year in years_set:

Recruitment_year<=year and leave_year >= year

My first issue is how I can write something like the above command to kibana. It is easy for a given year using a static query but I want it for every year.

My second issue is that I wonder if there is any way to deal with dates or generally values of dates that don't exist.
For example,at the following diagram I want to find the average duration of employees per year of recruitment. But at 2011 there isn't any recruitment and we have a continues line from 2010 to 2012. Instead I would prefer to have a zero value for 2011.

weltenwort · October 25, 2017, 2:46pm

Hi @lsouvleros,

it would be helpful to know whether you want to visualize the data using Kibana's visualizations or extract them as JSON via Elasticsearch's http API and use them in another tool. I will assume the former for my answers.

For your first question I would suggest to add a scripted field of type number called employed_years to your index defined as

IntStream.range(doc['Recruitment_year'].value, doc['leave_year'].value).toArray()

which holds all the years that the person was an employee. Then you can use that fields e.g. in histogram or terms aggregations to get the number of employees.

I will get to your second question in a short while in a follow-up post.

weltenwort · October 25, 2017, 5:56pm

The wording of your last paragraph suggested that you intended to show an example of the diagram. That might indeed be useful to help my understanding of your problem.

lsouvleros · October 25, 2017, 10:27pm

Look for example the following picture. For 2010 and 2012 I have some values which give me information for these years, but for 2011 I don't have any value/record. So I would like the value of 2011 to be 0. Instead there is no value for 2011 and the line just piece together the values of 2010 and 2012.

lsouvleros · October 26, 2017, 8:12am

Hi @weltenwort,

Thanks you for your help. I want to visualize the data using Kibana's visualization.

At a next stage, could I use a scripted field again to calculate the following
number_of_recruitments_per_year / total_number_of_employees_per year ?

weltenwort · October 26, 2017, 9:12am

Ok, I see what you mean in regards to missing values. By default Elasticsearch leaves out empty buckets and Kibana's line visualization just connects the non-empty buckets. If you are using a Histogram or Date Histogram aggregation on the x-axis you can force Elasticsearch to also return empty buckets in between by specifying {"min_doc_count" : 0} in the advanced JSON Input field of the aggregation:

Since the data are not continuous anyway, disabling the line and just using the dots or switching to a vertical bar chart might also be appropriate here.

Regarding the ratio: Scripted fields can only access data from a single document at a time. It is therefore not possible to use them to calculate such a ratio. By combining several pipeline aggregations it should be possible to let Elasticsearch calculate that, but those aggregations are unfortunately not yet supported by the visualization editor.

lsouvleros · October 26, 2017, 9:20am

So there is no solution or trick to calculate it at the moment?

lsouvleros · October 26, 2017, 11:53am

I am getting this error
Unrecognized function call (IntStream.range),"status":500

Moreover, there is type Array to Kibana. What type of data will it save?
How will be their format?

weltenwort · October 26, 2017, 12:12pm

Which version of the Elastic Stack are you using?

lsouvleros · October 26, 2017, 12:14pm

5.6.3

lsouvleros · October 27, 2017, 8:27am

This worked changing Int to Long
LongStream.range(doc['recruitment_year'].value, doc['leave_year'].value).toArray()
It was a false error. Thanks you a lot for your help.
If you can suggest any way to visualize the ratio I would be grateful.

weltenwort · October 27, 2017, 9:43am

I was able to produce something that might match your requirements using timelion. It contains both graphs in one, but could obviously also be separated:

It could even display a higher accuracy than just years (if your dates are precise enough):

The timelion expression for that was

$INDEX="employee", $START_FIELD="recruitment_date", $END_FIELD="leave_date", .es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum().label("employee count").lines(width=1, fill=3, steps=true), .es(index=$INDEX, timefield=$START_FIELD).divide(.es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum()).label("recruitment/employee ratio").lines(width=1, fill=7, steps=true)

This doesn't depend on the scripted fields we defined earlier but uses timelion's ad-hoc math capabilities.

weltenwort · October 27, 2017, 9:47am

To make the ratio more readable, it could also be placed on the right-hand axis:

$INDEX="employee", $START_FIELD="recruitment_date", $END_FIELD="leave_date", .es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum().label("employee count").lines(width=1, fill=3, steps=true), .es(index=$INDEX, timefield=$START_FIELD).divide(.es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum()).label("recruitment/employee ratio").lines(width=1, fill=7, steps=true).yaxis(yaxis=2, units="percent", max=2)

system · November 24, 2017, 9:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Attempt to divide two fields at the aggregate level causing infinity response Kibana	21	2052	March 5, 2022
Calculations with ELK Elasticsearch	2	883	July 31, 2017
Challenges with date/time range selection when combining snapshot and timeseries data Kibana	18	855	July 19, 2022
Convert MySQL query to elasticsearch Kibana	16	8415	July 6, 2017
Kibana Histogram unexpected line curve with cumulative value Elasticsearch	14	552	July 6, 2017

Create custom metric for Y-axis, dealing with no existed values

Related topics