Logstash action "Update"

I have several huge files (hundreds of GBs) that I want to "double parse" with logstash.

I have one file which maps user IDs to user emails, and then another huge file which maps user IDs to their actions.

So in my first parse (which is very normal), I'll index the user ID and action. In the second parse, I'll make use of logstash's output action => "update" to update it with the email address.

However, I'm not very clear of how Elastic is doing this updating. Does it delete (hide) the old data and make a new copy? If so, does the old data ever get deleted? Am I doing something completely wrong here?

Thanks!

However, I'm not very clear of how Elastic is doing this updating. Does it delete (hide) the old data and make a new copy?

Yes. So, avoid that. Maybe you can stuff the lookup tables in a database and use a jdbc_static or jdbc_streaming filter.

If so, does the old data ever get deleted?

Yes, when the segment files are rewritten. I believe that's done when they become too many, but check the ES docs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.