Here's the situation. A source shipping logs to our cluster has messed up and sent us a large number of documents containing undesirable fields. The thing is that the fields names are almost random, there is no way to list them all. There is a common pattern to all of them but also each of them differ in some way.
I am searching for a way to reindex the documents so I can remove said fields. However so far all the ways I have found imply that the name of the fields to be removed must be known. Is there any way to maybe create a pipeline that could isolate the fields using a regular expression or any other way I am not seeing?
Any suggestion you might have will be very welcome.
That is precisely what I am trying to do. However what I am not seeing is how to isolate the fields to be removed without specifically naming each one by name and removing them. What I am not finding is how to, say, loop on them by using a regular expression since there are all different in name save for one common pattern that repeats itself in their name.
Well that worked. Hopefully a more flexible solution will exist in a not-too-far future but your suggestion worked for the time being. Thank you good sir!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.