After some playing around and tweaking the job definition, it turned out that the job was failing because the page_size was too large causing the search query to fail.
But it would be good to have some feedback when checking the status because manual debugging and tweaking took some time.
I also noticed that the rollup job just keeps going even when there is a bulk execution failure which is not ideal in my case as it means documents are skipped. In my case, the bulk failure was due to the below template mapping which I will fix but it would be good if we can configure the rollup job to fail if there is any bulk failures so that there is no data loss:
org.elasticsearch.index.mapper.MapperParsingException: Could not dynamically add mapping for field [@timestamp.date_histogram._count]. Existing mapping for [@timestamp] must be of type object but found [date].
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.