Recursive nested documents in elasticsearch

Hi all,

I have a log file containing a series of jobs (each job containing sub jobs recursively) along with start and end time for each job. The log file looks like this:

 Job Name           Start time           End time             
___________________ ____________________ ____________________
A                   01/31/2017 13:05:23  01/31/2017 20:49:14
-i                  01/31/2017 13:05:37  01/31/2017 17:24:06
--a                 01/31/2017 13:33:16  01/31/2017 13:59:46
---1                01/31/2017 13:33:17  01/31/2017 13:33:17
---2                01/31/2017 13:33:28  01/31/2017 13:33:29
--b                 01/31/2017 13:59:47  01/31/2017 14:03:42
---1                01/31/2017 13:59:48  01/31/2017 13:59:51
---2                01/31/2017 13:59:51  01/31/2017 14:00:36
-ii                 01/31/2017 13:59:51  01/31/2017 14:20:36
--a                 01/31/2017 13:56:51  01/31/2017 14:03:36
---1                01/31/2017 13:34:51  01/31/2017 14:05:36
B                   01/31/2017 13:52:51  01/31/2017 14:48:36
-i                  01/31/2017 13:12:51  01/31/2017 14:22:36

Here, each level of indentation, represented by '-', represents a sub job. I have parsed the file through logstash by specifying grok filter and fed the output to elasticsearch. I created the default logstash index pattern in Kibana. My aim is to calculate the job times for each job on various levels so that I can drill down/roll up to different levels to obtain the job times at different levels. But I have been able to calculate the job times only for each individual job, using the scripted field:
doc['end_time'].value - doc['start_time'].value

I have not defined any mapping and allowed logstash to define a default mapping.

I think I need to use nested mappings in order to have a document representing a sub job inside another document representing a job. I am unable to proceed further as I cannot find any documentation regarding the creation of such recursive nested documents. Could u please help me out?

Thanks in advance

Do you really need recursive nesting?

If your job names are indexed to reflect the hierarchy e.g. A/i/a/2and also as the leaf 2 you can query average durations for both all 2 jobs (regardless of hierarchy) and those as part of A/i/a.
Reading your example log files it looks like job A already has a duration which need not be summed from it's constituent sub jobs?


Thanks for the quick response.

My input log file contains spaces before the names of the jobs which represent the jobs' level. In my grok filter, I ignore all the spaces for each new line and hence each job is treated as a separate entity and there is no hierarchy as such when the log file is indexed. I would like to maintain the hierarchy.

My actual intention is to represent the job names and their corresponding job times on a graph with the ability to drill down to various levels to view the job times at various levels.

As you correctly mentioned, I have the job times for jobs as a sum of their sub jobs but as each job is an individual document in my index without any relation with the other, I am unable to write a suitable query to visualize the job times in kibana vertical bar chart.

OK - so like JDiskReport but for figuring out where time goes, not disk space. Is this the right way of thinking of the problem i.e. you only want to account for sub job i in terms of what it contributed to parent job A/i rather than looking for totals of sub jobs called i across A/i, B/i, foo/i etc?


I'm extremely sorry, I gave the wrong example of sub job i.
Each job has a completely different set of sub jobs. So i inside A is not a sub job of any other job.

I want to account for the contribution of each sub job to its respective super job to the level of the highest super job.

OK, so it's strictly hierarchical.

Presumably in Kibana you're going to want to go from a root level overview of all level-one jobs and allow a drill-down when you click on one of them to its level-2 jobs and from there drill-down to its level 3s etc. Unfortunately I can't think of a slick way of doing this - you'll need to check in the Kibana forum for ideas. My assumption is that you'd have a saved viz showing a bar chart of terms agg on a "level1 field. Clicking bar B on that would create a level1:B filter. If you select to "pin" this filter you could then open another saved viz on the level2 field - showing IDs of level2 jobs but filtered by your selection of level1:B.

We'd also need to figure out how best to get logstash to create the indexed terms.

It's probably best to walk back from the UI you're going to build towards what index format is required to support that and in turn what ETL process will feed that.

Correct @Mark_Harwood

Thanks for the suggestion, will look for help in the Logstash and Kibana forums.

So do you think nested documents aren't a suitable solution to my problem?

And isn't there any other option with respect to Elasticsearch which would help me in this matter?

Technically you could hold only leaf level docs and use Elasticsearch to aggregate everything up on-the-fly (assuming the duration of a job is fully accounted for by the sum of the contained job durations).
I think the constraining factor here will be what your choice of UI can do. Kibana doesn't do nested docs. It may not even give a slick UI on flat docs for the reasons I outlined so that's why I recommend starting by figuring out how you can make the UI work first - I don't think Elasticsearch will be the source of your problems.

Okay, I'll see what else I can do.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.