Hi,
I wanted to experiment with the new Rollup api, so I loaded a small chunk of data over to a dev cluster and configured a rollup job. The job stats object shows this:
I've configured the cron expression to make it run every minute, but it's always just stuck in the started state (not indexing) and pages processed remain at 0.
Ah, I see some exceptions regarding "data too large" in the Elasticsearch job. I guess I was expecting that kind of info to be surfaced into the job response somehow. Will dig into it.
I'm getting a circuit breaker exception, which seems to be the root cause of why my rollups arent generating. Funny thing is, whatever I set the heap size to, the exception triggers on slightly data too large exceptions:
Originally max mem was set to the default of 1GB, which triggered the circuit breaker at around 750MB. Now when upping to 4GB, I'm getting this:
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [request] Data too large, data for [<reused_arrays>] would be [3200119680/2.9gb], which is larger than the limit of [2495663308/2.3gb]
The "raw" webserverlogs index currently holds 117MB of data.
EDIT: Looks like the root cause was the fact that the rollup index would be hit by the mapping template of the raw index. As far as I can see, rollups expect to be able to create their own mappings. Maybe something to clarify in the documentation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.