Hi there,
I have been asked if elasticsearch can be helpful solving the following task.
We want to analyze our user's workflow, meaning to check in which order users are calling our functions in a session and we would like to guess what users will do next. Lets make an example. Our logs are providing information of some user user workflows:
session 1: func1, func2, func16, func18, func46, func77, func33
session 2: func2, func16, func18, func46, func1, func12
session 3: func11, func16, func48, func88
session 4: func1, func2, func16, func18, func46, func77, func32
session 6: func1, func2, func46, func18, func16, func77, func33
Above we see 5 sessions and the order of their function calls.
I would like to be able to search for all workflows who have called for func16, then func 18, then func46 (in this order). And I want to get back what the next function call may be, ordered by the number of occurrence in my data lake.
So user 1, 2 and 4 are representing a valid search, because they have called the searched function in a given order.
As result I would like to get the next function call, ordered by hits.
So I would loke to get:
hits: 2, next workflow item 77 (by user 1 and 4)
hits 1: next workfow item func1 (by user 2)
All function calls are coming as single log events. Each workflow can be identified by some session id.
What possibilities do I have to reach my aim?
I have following ideas in my mind, but they might not work or have a lot of lacks inside.
#1: I could create parent / child relationships. The following function call has the previous function call as parent. But is it possible to search for a chain of parent child relations in elasticsearch with a single query? Is that more more or less expensive?
#2: I could create a long string field in a workflow event. With each function call I will append the new function call to the string.
- Then I could search for some substring. I assume I need some analyzer tweaking (never got in touch with it yet) to get it quite performant. Are there any analyzers which are respecting the order of terms? It is also possible that a function is called multiple times. Then i also need to differ between
func1, func2, func1
andfunc2, func1, func1
calls. Do analyzers can help here? I assume using a wildcardsearch like*func16, func18, func46*
will be really expensive. - could i also add some fuzzy search options on the search that a single function call can be differ or be missing and that I still get some result?
I would be glad if someone could tell me, if elasticsearch is the right tool to answer this question and a point at #1. #2 or #3 (you have it ) would be really appreciated.
Thanks a lot, Andreas