Add a new field to document based on the value of the previous with the same correlation id

Hello. I want to index following logs in elasticsearch:

message1 correlation1 urlfield
message2 correlation6
message3 correlation1
message4 correlation1

I want documents message3 and message4 to contain the urlfield field of the first correlation1 (message1) document.
How to do it? I can not use logstash because logs are balanced to multiple logstashes and are out of order so I probably must to do it somehow on elasticsearch site...

I use following components: elasticsearch, logstash, kibana.

Create an entity centric index.
See https://youtu.be/yBf7oeJKH2Y
The comments section for the video has the link to example code and data

Is it suitable for log management usage? I have just one index "filebeat-app" for for some app. I found logstash input elasticsearch - could I use it for this? filebeat-app is quite big (800G+) ...

Is it suitable for log management usage?

You'd need to implement your own life-cycle management policies for entities. Logs are typically stored in time-based indices and retained for fixed periods but entities can span these time periods and need their own retention policy.
You should adopt an incremental update approach to your entities based on latest changes in your event store (aka logs). This thread got into the specifics of how to implement a robust incremental update pipeline for entities.

OK. Got the principle. Is it required to do it with external scripts/tools? Or is it a good idea to implement it using losgstash elasticsearch input + filter or using watcher scheduler?

or using watcher scheduler?

Watcher will rely on aggregations whose request/response interaction will put a memory limit on how much data on individual entities you can bring back from the event store. In contrast, the scroll api is a streaming interface and therefore allows you to process larger volumes of entity events.

You can piece the pull-and-push task together with whatever agent can use the scroll and bulk apis respectively to create the flow of data from event store to entity store. I used Python but that's not the only possible client.

However the in-place update of entity docs on the shards where they live is best done with a Painless script.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.