Recommendations to Create Entity-Specific Models with Nested Hits/Documents

Hi all!

I have event logs from my system that describe the actions that users take. I am trying to devise the best method for storing those events in a user entity model.

For example, my event data looks like this:

{time: 2021-01-03, customerId: 1234-5678, action: "logged-in"}
{time: 2021-01-03, customerId: 1234-5678, action: "viewed-item", item-id: 222}
{time: 2021-01-07, customerId: 1234-5678, action: "logged-in"}
{time: 2021-01-07, customerId: 1234-5678, action: "viewed-item", item-id: 444}
{time: 2021-01-07, customerId: 1234-5678, action: "viewed-item", item-id: 555}
{time: 2021-01-11, customerId: 1234-5678, action: "logged-in"}
{time: 2021-01-11, customerId: 1234-5678, action: "viewed-item", item-id: 444}

And my desired output is something like this:

{
    customerId: 1234-5678,
    productsViewed: [{time: 2021-01-03, item-id: 222},{time: 2021-01-07, item-id: 444},{time: 2021-01-07, item-id: 555},{time: 2021-01-11, item-id: 444}]
	logins: [{time: 2021-01-03}],{time: 2021-01-07},{time: 2021-01-11}]
}

Is there a recommended approach using the Elastic stack to transform my event logs into entity-centric models that contain nested (and abbreviated) event data like this?

Elasticsearch transforms do not support aggregations that return hits/documents (e.g. top_hits), so I do not see a way to leverage transforms to output a nested array of abbreviated events. I do not believe ingest pipelines are the solution, either, as they seem to transform documents more than create entities.

For accessing full documents you can to use a scripted_metric aggregation, the ootb aggregations work on single fields, not documents.

Top hits/metric won't help you, as you still need to collapse the entries into a document.

My advent calendar post from 2019 does collapsing the way you describe it, it is written in german, but maybe you can use in-built browser translation or a web service to translate it:

Hi. Thanks so much for the response and the link to your article. You covered my use case perfectly.

When I started exploring transforms, I had written a basic scripted_metric aggregation within a transform to accomplish my goal, but it felt like I was fighting against the current. Conceptually I feel like aggregations are designed for summarizing documents, not embedding documents, so I openly wondered if there was a different approach to take. It is encouraging to see others following the same approach that I was contemplating, so I will move forward with scripted_metric for now.

Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.