Clients sending invalid docs

mhughes · April 13, 2016, 2:23pm

I have a bunch of clients that are sending documents directly to my ES cluster via bulkIndex. Those documents are valid for 1.x, but are now invalid with 2.x. Specifically, a lot of the documents had periods in field names. Sadly, it won't be trivial or quick to upgrade all those clients.

My current plan is to stand up a separate 2.3 cluster. I can use logstash's elasticsearch input/output plugins to copy over all the existing indices. My problem is, new data is being written all the time with the now-invalid documents. Is there anything in logstash that will watch an index and slurp in just the data? All the indicies are daily indicies; at worst I could setup a cron job to copy over the current day's indices every N time period. But I'd rather have it react to real traffic.

warkolm · April 14, 2016, 12:57am

The ES input can't do streaming like that, it's batch only.

mhughes · April 14, 2016, 1:20pm

Ok, is there any way that ES Watcher helps? My two other options that I can see are (short of upgrading every client, which again, is out of my control):

writing some sort of http proxy that intercepts and mangles the request before it hits ES
awake the ES input every N minutes and re-index everything. If I just limit it to the past 24-48 hours, I think that is feasible. I'm not sending that much data. Maybe a 1 gig a day at most.

Thoughts on either option or what I could have done to keep me out of this hole? I guess I could have gone directly to logstash for everything but as there was no transformation needed at the time, it seemed wasteful. And to be honest, Elasticsearch performed a lot more reliably IME than Logstash.

warkolm · April 14, 2016, 11:00pm

It could, might be a bit complicated though.

I'd do that.

Topic		Replies	Views
Logstash closes after reading ES index Logstash	5	941	July 6, 2017
Logstash Eleasticsearch Input Plugin for streaming data Logstash	2	877	July 6, 2017
ES and logstash are nit working well with each other Elasticsearch	6	490	July 6, 2017
Elasticsearch reindex using logstash Logstash	5	5833	April 15, 2017
Elasticsearch data indexing for logstash Elasticsearch	10	2007	July 5, 2017

Clients sending invalid docs

Related topics