Entity-centric indexing with Transforms

Hello, we're working with Elasticsearch for the first time and we are currently deciding on what would be the best solution for our problem at hand.

We are receiving Event based logs (in JSON form) from our applications directly to Elasticsearch index. These logs are highly interconnected (they share a common unique ID) and therefore we need to convert/aggregate them in an Entity-centered fashion.

Each event usually has a status change in the target field. There are more statuses than just start/end. Document has more data which can be used to create more than just one Entity-centered index.

{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStart: {timestamp: 2020-06-01T13:50:55.000Z}
}
}
{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStop: {timestamp: 2021-06-01T13:50:55.000Z}
}
}

We were already able to join these documents using Python or Logstash. We basically created an index that contains the following documents:

{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStart: {timestamp: 2020-06-01T13:50:55.000Z},
eventStop: {timestamp: 2021-06-01T13:50:55.000Z}
*time_dif_Start_Stop : xxxx*
}
}

We assigned all events document ID that is the same as uniqueID which updated them automatically. Next step just calculated the difference between eventStart and eventStop timestamps.

We have certain requirements for our pipeline so we would prefer if data never has to leave elasticsearch. Therefore, we are wondering if it is possible to do this with any of the tools that already exist in the ELK stack or are hosted in the Elastic cloud? We tried using Transforms but we were only able to calculate aggregated fields in a new index. Is it possible to also basically merge/update all the documents into a single one with this tool or any other? It would be ideal for us as it is running on a schedule and we do not need any external tools to modify documents.

Any other suggestions or help would also be greatly appreciated.

Hi,

It seems like transforms should be fitting your needs but it would be good to know more details.

If you tried using transforms, could you show the config you were using for that?
What do you mean by "only able to calculate aggregated fields"? eventStart should be a result of min aggregation. Similarly eventStop should be a result of max aggregation.
Is it time_dif_Start_Stop that is problematic for you? It looks like it could be calculated an ingest pipeline attached to your destination (entity-centric) index.

No, the time_dif_Start_Stop is not problematic, we are able to caluculate it with scripted metrics, and write it to destination index. What we are wondering is how to also "transfer" some of the existing fields (that are not part of aggregations and calcualtions) from the source index to the destination index, based on the shared ID (uniqueID)

If there are not many such fields, you can put them in the group_by section of the transform config.
Of course, in such case they are not meant to be used for grouping (as grouping is achieved by having uniqueID) but they will be present in the destination index.

Please note, however, that if you have many such fields, it can impact performance of the transform.

1 Like

Thanks for your answer, that is what we needed.