Adding an Expensive Field across Legacy data

Randall_Hand · July 11, 2025, 4:13pm

I have about 2 years of data imported into my ELK stack, working great. It's logs from our native desktop application, regularly imported via LogStash twice a day.

I would like to historically reprocess this data to add an expensive field calculation. Occasionally we get a stacktrace in the logs, and I would like to add a MD5 Checksum to that one line/record, so that it can be searched for and correlated across multiple occurances. Across all my data, there are currently only 1,150 events that match a 'message: FATAL' search, and that's the records I would like to update.

What's a good way to do this? This doesn't seem well-suited to the overhead of a runtime field, and I have all this data already imported that I want to modify (and more coming in daily).

stephenb · July 11, 2025, 5:43pm

Hi @Randall_Hand, Welcome to the community, and cool question.

Just trying to get a bit of a grasp because the issue will be in the details here

What version of the stack are you on?

Are you using "regular" indices or data streams. How many indices are we talking about? Is there searchable snapshots involved?

Are you saying you really just want to update ~1200 events accross the entire multi year data set?

Are you saying that you can easily identify which documents you want to update?

because tl;dr if you are using regular indices...
And you can pull the entire 1200 documents...
I would just get all the document (complete source)
Compute the new field
Then update the docs directly using the document update API since you will have the entire document, the index and document id.

Perhaps I am missing something.

Well, and you will need to fix the ongoing data as well.

Randall_Hand · July 11, 2025, 5:56pm

That's pretty much it. These are regular indices, weekly so I have ~100 for the last two years. I can easily identify the records with a simple 'message: FATAL' query.

And I'm on 8.7.1, self-hosted.

stephenb · July 11, 2025, 5:59pm

Yeah I would say don't over complicate it....

Chatgpt will probably write Python program that would do all of for you

leandrojmp · July 11, 2025, 9:39pm

You want to generate a MD5 checksum of the value of a specific field and add it to the existing document?

I think this can be done with Logstash.

Randall_Hand · July 11, 2025, 9:52pm

A subset of a specific field.. Only if the field (message) contains a stacktrack, and then the MD5 of only the stacktrace, not any of the leading/following lines.

Topic		Replies	Views
Add value to a previously indexed field with logstash Logstash	1	154	December 22, 2023
Add a new field to all index in ElasticSearch Elasticsearch	4	480	August 27, 2019
ES 1.3 - Calculating+updating a field in millions of documents Elasticsearch	5	1290	July 5, 2017
Add a new field in all the existing documents in ES Elasticsearch	3	339	September 21, 2022
Update single fileds in a given document already stored in ES? Logstash	1	335	March 12, 2018

Adding an Expensive Field across Legacy data

Related topics