Hello,
We have an application that models transactional data. Each document describes a single transaction between two entities.
Our current import process is done in two phases. First we ingest all the transactions into a transactions
index. Afterwards, we run a suite of aggregations to against that index to build an entities
index which contains aggregated fields from the transaction documents along with some higher level statistics (ex. total transactions, total transaction sum, etc). However over years this system has grown to become unwieldy and slow.
A colleague recommended using elasticsearch transforms for this and at first glance it seemed like a perfect fit. However we have run into implementation issues due to the fact that each transaction aggregates into two entities.
For example, these two transactions:
{
ids: [
a, b
],
src_id: a,
dst_id: b,
src_field: 5,
dst_field: 6
},
{
ids: [
b, c
],
src_id: b,
dst_id: c,
src_field: 3,
dst_field: 12
}
Which can describe the following relationships:
entityA entityB entityC
^ ^ ^ ^
\ / \ /
\ / \ /
transactionAB transactionBC
Would need to transform into three entity docs:
{
id: a,
count: 1,
field_sum: 5
},
{
id: b,
count: 2,
field_sum: 9
},
{
id: c,
count: 1,
field_sum: 12
}
I've attempted running two separate transforms, one that groups on the src_id
side of the transaction, and the another on the dst_id
side of the transaction, aggregating the respective fields, but they overwrite each others output in the write index.
I've attempted running it as a single transform, which aggregates on the id
field, but then the src / dst fields become mixed together.
I think this could be solved using three transforms and an intermediary index, but at that point the complexity isn't worth it in our use case.
Is there anyway to achieve the desired results without the use of intermediary indices? I know elasticsearch has a alot of functionality with regards to scripted aggregations and scripted processors. I was hoping someone may be able to , but I'm not sure if they can solve this issue.