Best Practice for building new version of an index and replacing existing using LogStash


(Aaron Bauman) #1

I am pulling entities out of a database and serializing them to .json using a C# console app.
The .json files are then read by Filebeat which sends the data to LogStash, which pushes into Elasticsearch.

Every 5 minutes I will query the database for entities that have changed.
Once a day (or week) I would like to completely rebuild the entire index, just to make sure we haven't missed anything.

I understand the concept of aliases and intend to use one for this.

How should I configure Logstash to push simple changes to the existing index, but push the complete rebuild to the next version?
I thought about putting the day (or week-depending on the rebuild threshold) timestamp in the index name of the Logstash Filebeat-to-Elasticsearch config, but this would simply create a new version, I wouldn't know when to swap out old for new.

I could have my console app handle this, but I'm not sure how it would know that Logstash was done pushing files to the new index...

Also, it would be nice and maybe even imperative to be able to queue up an entire rebuild on-demand. I am new to Elasticsearch and I fear I may need to be react quickly in the event I do something wrong.

Looking for some seasoned input. Maybe I'm thinking about this all wrong and there is a better route to go.


Recreating an index from LogStash and swapping it via alias
(Mark Walkom) #2

You need to be careful of sql_last_value. Have a separate config file somewhere that LS doesn't auto-read, then run that via cron or something.

What does filebeat have to do with this though?


(Aaron Bauman) #3

I am using a custom app to pull data from the database, not JDBC. Nonetheless, thank you for the warning about sql_last_value.

My apologies, when I said "filebeat config" I meant my LogStash config file that is pulling from Filebeat and pushing to Elasticsearch. (The file name is "filebeat.config")

I thought about having a separate LogStash config file for the "rebuild" job and run that separately, but I still wasn't sure how to coordinate the swap with the old index to the new one after the "rebuild" is complete.


(Mark Walkom) #4

Try https://www.elastic.co/blog/changing-mapping-with-zero-downtime


(Aaron Bauman) #5

Yeah I read that article before and it provides some great thoughts. I already intend to use aliases, but those don't provide the solution I'm looking for.

When I create a new version of the index and LogStash pushes all of the documents into it, I need to know when it is complete in order to flip the alias to point at the new index. At which point I can then delete the old.

I could set a second job to run X minutes afterwards, but if the new index isn't ready to go...

I suppose I could get a count of the documents that I put in the Filebeat Input .json File and hit the new version of the index in Elasticsearch for a count. Once the count hits the expected number, I could flip the alias.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.