Create custom metric for Y-axis, dealing with no existed values

Hello,
I just started to use ELK and I want your help with issues.
I have a dataset with the following fields
Username,Name,Recruitment_date,Recruitment_year,duration_at_company,leave_date, leave_year

and I want to find the number of employees for each year. So we want
for year in years_set:

Recruitment_year<=year and leave_year >= year

My first issue is how I can write something like the above command to kibana. It is easy for a given year using a static query but I want it for every year.

My second issue is that I wonder if there is any way to deal with dates or generally values of dates that don't exist.
For example,at the following diagram I want to find the average duration of employees per year of recruitment. But at 2011 there isn't any recruitment and we have a continues line from 2010 to 2012. Instead I would prefer to have a zero value for 2011.

Hi @lsouvleros,

it would be helpful to know whether you want to visualize the data using Kibana's visualizations or extract them as JSON via Elasticsearch's http API and use them in another tool. I will assume the former for my answers.

For your first question I would suggest to add a scripted field of type number called employed_years to your index defined as

IntStream.range(doc['Recruitment_year'].value, doc['leave_year'].value).toArray()

which holds all the years that the person was an employee. Then you can use that fields e.g. in histogram or terms aggregations to get the number of employees.

I will get to your second question in a short while in a follow-up post. :wink:

The wording of your last paragraph suggested that you intended to show an example of the diagram. That might indeed be useful to help my understanding of your problem.

Look for example the following picture. For 2010 and 2012 I have some values which give me information for these years, but for 2011 I don't have any value/record. So I would like the value of 2011 to be 0. Instead there is no value for 2011 and the line just piece together the values of 2010 and 2012.

Hi @weltenwort,

Thanks you for your help. I want to visualize the data using Kibana's visualization.

At a next stage, could I use a scripted field again to calculate the following
number_of_recruitments_per_year / total_number_of_employees_per year ?

Ok, I see what you mean in regards to missing values. By default Elasticsearch leaves out empty buckets and Kibana's line visualization just connects the non-empty buckets. If you are using a Histogram or Date Histogram aggregation on the x-axis you can force Elasticsearch to also return empty buckets in between by specifying {"min_doc_count" : 0} in the advanced JSON Input field of the aggregation:

image

Since the data are not continuous anyway, disabling the line and just using the dots or switching to a vertical bar chart might also be appropriate here.

Regarding the ratio: Scripted fields can only access data from a single document at a time. It is therefore not possible to use them to calculate such a ratio. By combining several pipeline aggregations it should be possible to let Elasticsearch calculate that, but those aggregations are unfortunately not yet supported by the visualization editor.

So there is no solution or trick to calculate it at the moment?

I am getting this error
Unrecognized function call (IntStream.range),"status":500

Moreover, there is type Array to Kibana. What type of data will it save?
How will be their format?

Which version of the Elastic Stack are you using?

5.6.3

This worked changing Int to Long
LongStream.range(doc['recruitment_year'].value, doc['leave_year'].value).toArray()
It was a false error. Thanks you a lot for your help.
If you can suggest any way to visualize the ratio I would be grateful.

I was able to produce something that might match your requirements using timelion. It contains both graphs in one, but could obviously also be separated:

It could even display a higher accuracy than just years (if your dates are precise enough):

The timelion expression for that was

$INDEX="employee", $START_FIELD="recruitment_date", $END_FIELD="leave_date", .es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum().label("employee count").lines(width=1, fill=3, steps=true), .es(index=$INDEX, timefield=$START_FIELD).divide(.es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum()).label("recruitment/employee ratio").lines(width=1, fill=7, steps=true)

This doesn't depend on the scripted fields we defined earlier but uses timelion's ad-hoc math capabilities.

To make the ratio more readable, it could also be placed on the right-hand axis:

$INDEX="employee", $START_FIELD="recruitment_date", $END_FIELD="leave_date", .es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum().label("employee count").lines(width=1, fill=3, steps=true), .es(index=$INDEX, timefield=$START_FIELD).divide(.es(index=$INDEX, timefield=$START_FIELD).subtract(.es(index=$INDEX, timefield=$END_FIELD)).cusum()).label("recruitment/employee ratio").lines(width=1, fill=7, steps=true).yaxis(yaxis=2, units="percent", max=2)
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.