CSV exporting reports fails almost all the time

In very similar vanes as 7.5 [reporting] fails randomly and Kibana Reporting -csv download fails frequently with error 'Unable to generate report Max attempts reached (3)' we're currently unable to do CSV exporting most of the time.
We're on kibana 7.13.4 with Elasticsearch 7.13.3.
Outside of server.port, server.host and the Elasticsearch connectivity options our entire kibana.yml is;

monitoring.ui.ccs.enabled: false
xpack.encryptedSavedObjects.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha
xpack.reporting.csv.maxSizeBytes: 512mb
xpack.reporting.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha
xpack.reporting.queue.timeout: 15m
xpack.security.encryptionKey: aijeeb8eiYoongo7ooy1uquah3aiz1ha

We do have kibana load balanced across multiple machines. No docker.
All have exactly the same configuration.

In the kibana log we get things like this.

kvtt491u0ekd2fccc5714yfm - _claimPendingJobs encountered a version conflict on updating pending job kvtt5lzq0ekd2fccc5e41dfi: ResponseError: version_conflict_engine_exception
    at onBody (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:337:23)
    at IncomingMessage.onEnd (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Transport.js:264:11)
    at IncomingMessage.emit (events.js:387:35)
    at endReadableNT (internal/streams/readable.js:1317:12)
    at processTicksAndRejections (internal/process/task_queues.js:82:21)

sometimes kibana logs a HTTP 409 from Elasticsearch.
All the kibana instances log exactly the same _claimPendingJobs error each time a report fails.

On the kibana reporting page we get told

Error: Failed to decrypt report job data. Please ensure that xpack.reporting.encryptionKey is set and re-generate this report. Error: Unsupported state or unable to authenticate data

or

Max attempts reached (3)

I've been ramping xpack.reporting.queue.timeout up based on Kibana Reporting -csv download fails frequently with error 'Unable to generate report Max attempts reached (3)' but the report is failing long before the timeout is exceeded.
As in all 3 attempts fail within 2 or 3 minutes for large reports, or seconds for tiny reports.

http.max_content_length for Elasticsearch is 576mb.

To begin with we didn't have xpack.reporting.encryptionKey set. I don't know if that matters. Maybe there is some data structure somewhere that doesn't get updated?

The exports are both large and tiny.
I've managed to get exports of 3.3mb, 33mb, and one at 65mb. Any longer time frames than those and it fails every time, sometimes the report will fail at the 33mb or 65mb size time frames too.
I've also managed to have exports containing just 13 rows (of a timestamp and the single character "-" due to a failure on my part) fail, but then immediate afterwards work fine (it was 465 bytes long).

This warning can be ignored. See: Reporting troubleshooting | Kibana Guide [7.15] | Elastic

This means a request to generate a CSV came to one Kibana instance, and the job was claimed by a different instance that has a different encryption key. Find the instance that claimed the job in the Stack Management > Reporting screen. Each report has an info panel with metadata about the report: it includes the UUID of the Kibana instance that processed the job.

What is the amount of RAM on the Kibana instance that processed the job? It may be too low.

There should have been messages in the server logs stating that a random encryption key was being generated for the instance. Reports that were created with that encryption key can not be recovered since the key was ephemeral.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.