I have a bunch of clients that are sending documents directly to my ES cluster via bulkIndex. Those documents are valid for 1.x, but are now invalid with 2.x. Specifically, a lot of the documents had periods in field names. Sadly, it won't be trivial or quick to upgrade all those clients.
My current plan is to stand up a separate 2.3 cluster. I can use logstash's elasticsearch input/output plugins to copy over all the existing indices. My problem is, new data is being written all the time with the now-invalid documents. Is there anything in logstash that will watch an index and slurp in just the data? All the indicies are daily indicies; at worst I could setup a cron job to copy over the current day's indices every N time period. But I'd rather have it react to real traffic.
Ok, is there any way that ES Watcher helps? My two other options that I can see are (short of upgrading every client, which again, is out of my control):
writing some sort of http proxy that intercepts and mangles the request before it hits ES
awake the ES input every N minutes and re-index everything. If I just limit it to the past 24-48 hours, I think that is feasible. I'm not sending that much data. Maybe a 1 gig a day at most.
Thoughts on either option or what I could have done to keep me out of this hole? I guess I could have gone directly to logstash for everything but as there was no transformation needed at the time, it seemed wasteful. And to be honest, Elasticsearch performed a lot more reliably IME than Logstash.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.