Also, is there a way to do this in the logstash config? It would save time in large file size. Because I'm getting a "gateway timeout" in case of actual 22GB data.
I'm not sure how Logstash would help in this case. Update by query can only be called via the REST API or through a client library.
Also note that Gateway timeout doesn't mean that the task is killed, just that your client connection has timed out, but the task is still ongoing in the background. If you run the command below, you'll see that the update is still running:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.