In very similar vanes as 7.5 [reporting] fails randomly and Kibana Reporting -csv download fails frequently with error 'Unable to generate report Max attempts reached (3)' we're currently unable to do CSV exporting most of the time.
We're on kibana 7.13.4 with Elasticsearch 7.13.3.
Outside of server.port, server.host and the Elasticsearch connectivity options our entire kibana.yml is;
monitoring.ui.ccs.enabled: false
xpack.encryptedSavedObjects.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha
xpack.reporting.csv.maxSizeBytes: 512mb
xpack.reporting.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha
xpack.reporting.queue.timeout: 15m
xpack.security.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha
We do have kibana load balanced across multiple machines. No docker.
All have exactly the same configuration.
In the kibana log we get things like this.
kvtt491u0ekd2fccc5714yfm - _claimPendingJobs encountered a version conflict on updating pending job kvtt5lzq0ekd2fccc5e41dfi: ResponseError: version_conflict_engine_exception
at onBody (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:337:23)
at IncomingMessage.onEnd (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:264:11)
at IncomingMessage.emit (events.js:387:35)
at endReadableNT (internal/streams/readable.js:1317:12)
at processTicksAndRejections (internal/process/task_queues.js:82:21)
sometimes kibana logs a HTTP 409 from Elasticsearch.
All the kibana instances log exactly the same _claimPendingJobs error each time a report fails.
On the kibana reporting page we get told
Error: Failed to decrypt report job data. Please ensure that xpack.reporting.encryptionKey is set and re-generate this report. Error: Unsupported state or unable to authenticate data
or
Max attempts reached (3)
I've been ramping xpack.reporting.queue.timeout
up based on Kibana Reporting -csv download fails frequently with error 'Unable to generate report Max attempts reached (3)' but the report is failing long before the timeout is exceeded.
As in all 3 attempts fail within 2 or 3 minutes for large reports, or seconds for tiny reports.
http.max_content_length
for Elasticsearch is 576mb.
To begin with we didn't have xpack.reporting.encryptionKey
set. I don't know if that matters. Maybe there is some data structure somewhere that doesn't get updated?
The exports are both large and tiny.
I've managed to get exports of 3.3mb, 33mb, and one at 65mb. Any longer time frames than those and it fails every time, sometimes the report will fail at the 33mb or 65mb size time frames too.
I've also managed to have exports containing just 13 rows (of a timestamp and the single character "-" due to a failure on my part) fail, but then immediate afterwards work fine (it was 465 bytes long).