I have about 2 years of data imported into my ELK stack, working great. It's logs from our native desktop application, regularly imported via LogStash twice a day.
I would like to historically reprocess this data to add an expensive field calculation. Occasionally we get a stacktrace in the logs, and I would like to add a MD5 Checksum to that one line/record, so that it can be searched for and correlated across multiple occurances. Across all my data, there are currently only 1,150 events that match a 'message: FATAL' search, and that's the records I would like to update.
What's a good way to do this? This doesn't seem well-suited to the overhead of a runtime field, and I have all this data already imported that I want to modify (and more coming in daily).
Hi @Randall_Hand, Welcome to the community, and cool question.
Just trying to get a bit of a grasp because the issue will be in the details here
What version of the stack are you on?
Are you using "regular" indices or data streams. How many indices are we talking about? Is there searchable snapshots involved?
Are you saying you really just want to update ~1200 events accross the entire multi year data set?
Are you saying that you can easily identify which documents you want to update?
because tl;dr if you are using regular indices...
And you can pull the entire 1200 documents...
I would just get all the document (complete source)
Compute the new field
Then update the docs directly using the document update API since you will have the entire document, the index and document id.
Perhaps I am missing something.
Well, and you will need to fix the ongoing data as well.
That's pretty much it. These are regular indices, weekly so I have ~100 for the last two years. I can easily identify the records with a simple 'message: FATAL' query.
A subset of a specific field.. Only if the field (message) contains a stacktrack, and then the MD5 of only the stacktrace, not any of the leading/following lines.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.