Hello, we're working with Elasticsearch for the first time and we are currently deciding on what would be the best solution for our problem at hand.
We are receiving Event based logs (in JSON form) from our applications directly to Elasticsearch index. These logs are highly interconnected (they share a common unique ID) and therefore we need to convert/aggregate them in an Entity-centered fashion.
Each event usually has a status change in the target field. There are more statuses than just start/end. Document has more data which can be used to create more than just one Entity-centered index.
{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStart: {timestamp: 2020-06-01T13:50:55.000Z}
}
}
{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStop: {timestamp: 2021-06-01T13:50:55.000Z}
}
}
We were already able to join these documents using Python or Logstash. We basically created an index that contains the following documents:
{
*uniqueID*: ain123in145512kn
name: Bob
target: {
eventStart: {timestamp: 2020-06-01T13:50:55.000Z},
eventStop: {timestamp: 2021-06-01T13:50:55.000Z}
*time_dif_Start_Stop : xxxx*
}
}
We assigned all events document ID that is the same as uniqueID which updated them automatically. Next step just calculated the difference between eventStart and eventStop timestamps.
We have certain requirements for our pipeline so we would prefer if data never has to leave elasticsearch. Therefore, we are wondering if it is possible to do this with any of the tools that already exist in the ELK stack or are hosted in the Elastic cloud? We tried using Transforms but we were only able to calculate aggregated fields in a new index. Is it possible to also basically merge/update all the documents into a single one with this tool or any other? It would be ideal for us as it is running on a schedule and we do not need any external tools to modify documents.
Any other suggestions or help would also be greatly appreciated.