Aggregating entities from transaction documents using transforms

kbirk · January 21, 2021, 4:50pm

Hello,

We have an application that models transactional data. Each document describes a single transaction between two entities.

Our current import process is done in two phases. First we ingest all the transactions into a transactions index. Afterwards, we run a suite of aggregations to against that index to build an entities index which contains aggregated fields from the transaction documents along with some higher level statistics (ex. total transactions, total transaction sum, etc). However over years this system has grown to become unwieldy and slow.

A colleague recommended using elasticsearch transforms for this and at first glance it seemed like a perfect fit. However we have run into implementation issues due to the fact that each transaction aggregates into two entities.

For example, these two transactions:

{
    ids: [
        a, b
    ],
    src_id: a,
    dst_id: b,
    src_field: 5,
    dst_field: 6
},
{
    ids: [
        b, c
    ],
    src_id: b,
    dst_id: c,
    src_field: 3,
    dst_field: 12
}

Which can describe the following relationships:

entityA     entityB    entityC
   ^        ^    ^       ^
    \      /      \     /
     \    /        \   /
transactionAB    transactionBC

Would need to transform into three entity docs:

{
    id: a,
    count: 1,
    field_sum: 5
},
{
    id: b,
    count: 2,
    field_sum: 9
},
{
    id: c,
    count: 1,
    field_sum: 12
}

I've attempted running two separate transforms, one that groups on the src_id side of the transaction, and the another on the dst_id side of the transaction, aggregating the respective fields, but they overwrite each others output in the write index.

I've attempted running it as a single transform, which aggregates on the id field, but then the src / dst fields become mixed together.

I think this could be solved using three transforms and an intermediary index, but at that point the complexity isn't worth it in our use case.

Is there anyway to achieve the desired results without the use of intermediary indices? I know elasticsearch has a alot of functionality with regards to scripted aggregations and scripted processors. I was hoping someone may be able to , but I'm not sure if they can solve this issue.

Hendrik_Muhs · January 25, 2021, 8:37am

This seems like a quite complex task. Do I get this right, you basically want to treat src and dst the same way? The only required logic seems to be to use _field dependent on src or dst.

So option 1 seems to me to flatten the data before the transform. If you are able either in your application or using some sort of re-index to index 4 documents instead of 2 with:

{
    id: a,
    field: 5
},
{    
    id:b,
    field: 6
},
{
    id: b,
    field: 3
},
{
    id: c,
    field 12
}

After that 1 transform could do the job.

It is not a good idea to let 2 transforms write into 1 index, you can easily let them write into separate indexes and query using a pattern e.g. transaction_by_src, transaction_by_dst for the transforms and transaction_by_* to query the data.

Do you need the 3rd transform? If I get it right, the 3rd transform is just to combine the results of the other 2, that could be done query side. The 3rd transform would only combine either 1 or 2 documents per bucket. However your other 2 transforms probably reduce the data by larger factor. How many transactions and how many entities do you have?

As said, the only way to avoid more than 1 transform is to flatten the data before transform. Your initial problem seems to be: every document is really 2 documents, so you need a 1:2 mapping operation, but everything you can do e.g. in a scripted field is doing a 1:1 mapping. Fixing ingest can solve the problem.

system · February 22, 2021, 8:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can transform with init, map, combine, reduce emit multiple documents? Elasticsearch transforms	10	914	December 20, 2021
Understanding continous transforms syncing Elasticsearch transforms	5	4222	November 19, 2020
Is is possible to partially update a dest doc with transforms? Elasticsearch elastic-stack-machine-learning	5	1477	December 26, 2019
Transforms: How to aggregate multiple events into one event based on shared field? Elasticsearch transforms	1	247	November 12, 2023
Which aggregation should be used for transform operations on multifield? Elasticsearch transforms	3	392	April 1, 2022

Aggregating entities from transaction documents using transforms

Related topics