Hi, I've been having a little difficulty solving this particular problem using the tools available in Canvas.
In a relational database I have a table of contracts with ids and types of contract and each contract may have many tasks with ids, start date, end date and task name. Those were originally ingested in elastic as two indexes (one for contracts and one for tasks, which also contain all contract data) and I have limited say as on how to make this ingestion. Each contract doc in elastic also contains a column containing all tasks for that contract as a string combining start date, end date (if existing) and task name, in the following format:
start_date - (end_date | "N/A") - task_name
What I need is a way to count the sum of number of times two specific tasks occured (Task A or Task B for simplicity sake) for each type of contract, as long as Task A didn't occur right after Task B or A, in which case they should only be counted once. So:
C, A, C, A
should sum 2
C, B, C, A
should be 2 as well, while
C, B, A, C
should be 1 and
C, A, A, A
should be 1 as well.
There's also a time constraint, in which all start dates must be in the last "closed" month (right now that's Feb, 2019)
I can use essql with the tasks index to count all occurences of the tasks in the month, but that doesn't account for the duplicates (because looking at tasks alone I have no way of knowing what came before in the contract).
Using the contracts index I can bring the list of tasks as a string for each contract. My instinct is to tokenize, clean it of dupes in a list and then count it, but I found no way to do that using the functions or examples described in canvas.
Maybe someone has been through something similar before and can shed some light. Thank you very much.