I have documents that currently exist with a single _message field containing raw json. I'd like to reprocess all these documents and run them through a json parser to update the document and add fields / values.
What's the best approach for this?
External script to query, find a record needing updates, update it, then push it back to es as an update, or some other way?
You may want to look into using the reindex API together with an ingest node pipeline. This should work as long as all data is present within each document. If you want to parse and enrich it based on external data, you may need a different approach, e.g. Logstash.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.