Add a new field to document based on the value of the previous with the same correlation id

vb4t · August 9, 2017, 6:36am

Hello. I want to index following logs in elasticsearch:

message1 correlation1 urlfield
message2 correlation6
message3 correlation1
message4 correlation1

I want documents message3 and message4 to contain the urlfield field of the first correlation1 (message1) document.
How to do it? I can not use logstash because logs are balanced to multiple logstashes and are out of order so I probably must to do it somehow on elasticsearch site...

I use following components: elasticsearch, logstash, kibana.

Mark_Harwood · August 9, 2017, 6:44am

Create an entity centric index.
See https://youtu.be/yBf7oeJKH2Y
The comments section for the video has the link to example code and data

vb4t · August 9, 2017, 10:31am

Is it suitable for log management usage? I have just one index "filebeat-app" for for some app. I found logstash input elasticsearch - could I use it for this? filebeat-app is quite big (800G+) ...

Mark_Harwood · August 9, 2017, 10:37am

Is it suitable for log management usage?

You'd need to implement your own life-cycle management policies for entities. Logs are typically stored in time-based indices and retained for fixed periods but entities can span these time periods and need their own retention policy.
You should adopt an incremental update approach to your entities based on latest changes in your event store (aka logs). This thread got into the specifics of how to implement a robust incremental update pipeline for entities.

vb4t · August 9, 2017, 11:27am

OK. Got the principle. Is it required to do it with external scripts/tools? Or is it a good idea to implement it using losgstash elasticsearch input + filter or using watcher scheduler?

Mark_Harwood · August 9, 2017, 11:36am

or using watcher scheduler?

Watcher will rely on aggregations whose request/response interaction will put a memory limit on how much data on individual entities you can bring back from the event store. In contrast, the scroll api is a streaming interface and therefore allows you to process larger volumes of entity events.

You can piece the pull-and-push task together with whatever agent can use the scroll and bulk apis respectively to create the flow of data from event store to entity store. I used Python but that's not the only possible client.

However the in-place update of entity docs on the shards where they live is best done with a Painless script.

system · September 6, 2017, 11:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Update ElasticSearch existing documents with new fields Logstash	2	764	May 8, 2019
How to index a new document using existing documents? Elasticsearch	3	662	January 30, 2018
How to enrich a set of linked documents with a field value from one document? Logstash	1	287	September 3, 2021
How to add a field/value from a previuos log Logstash	2	1150	July 6, 2017
Update values based on previous documents Elasticsearch	3	265	March 24, 2022

Add a new field to document based on the value of the previous with the same correlation id

Related topics