Confluence (Server) Connector 302 Errors

We're having issues using the Confluence (Server) connector in Workplace Search. It starts indexing documents correctly, but at some point, it gets 302s in the form of:

[####-##-##T##:##:##.###+##:##][######][####][connectors][WARN]: ContentSource[<ID_OMITTED>, confluence_server]: Encountered error during extraction of 'Confluence ID: <ID_OMITTED>': Connectors::ContentSources::Atlassian::CustomClient::ClientError: <html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/#.##.##</center>
</body>
</html>

I don't actually care if this document doesn't get indexed. However, I do not want the connector to restart the entire job. How can I make it continue despite errors? Our (very old) current search engine has no such issues with our Confluence instance, so I'm not sure why there are no configuration options for the provided Confluence connector.

Hello, Ekansh. :slight_smile:

Which version of Workplace Search are you running?

Let's start there.

Thanks,

Kellen

Hey Kellen!

I'm running Workplace Search 7.10.1 (I believe that is the latest one?).

I know why the Confluence pages are erroring now - they can't be loaded in the browser either due to some macros. However, I still want Workplace Search to ignore those errors and continue indexing.

Hi Ekansh!

There's a known issue in Confluence regarding 302 redirects. We will try to handle it more gracefully in future versions.

For now, you could use several config options in config/enterprise-search.yml to increase tolerance to errors:

#workplace_search.content_source.sync.max_errors: 1000

#

# Configure how many errors in a row to tolerate in a sync job.

# If the job encounters more errors in a row than this value, the job will fail.

# NOTE: this only applies to errors tied to individual documents.

#

#workplace_search.content_source.sync.max_consecutive_errors: 10

#

# Configure the ratio of <errored documents> / <total documents> to tolerate in a sync job

# or in a rolling window (see `workplace_search.content_source.sync.error_ratio_window_size`).

# If the job encounters an error ratio greater than this value in a given window, or overall

# at the end of the job, the job will fail.

# NOTE: this only applies to errors tied to individual documents.

#

#workplace_search.content_source.sync.max_error_ratio: 0.15

#

# Configure how large of a window to consider when calculating an error ratio

# (see `workplace_search.content_source.sync.max_error_ratio`).

#

#workplace_search.content_source.sync.error_ratio_window_size: 100

Let us know if it helps!

Hi Vadim - thanks for finding the issue!

Yesterday, I went ahead and updated the configuration as following:

    workplace_search.content_source.sync.max_errors: 10000000
    workplace_search.content_source.sync.max_consecutive_errors: 100000
    workplace_search.content_source.sync.max_error_ratio: 1
    workplace_search.content_source.sync.error_ratio_window_size: 100

I don't think we have nearly this number of errors, so the job shouldn't stop. It might be the case that the UI simply reports these errors as existing and still finishes the entire indexing, but I'm skeptical because indexing is still happening on the frequency of every 15-30 minutes instead of the documented 2 hour interval.

Do you know if this is simply a UI issue or if the indexing is actually restarting moreso than is necessary (or how I can check)? I also wonder if maybe my parameters are wrong.

Thanks for the help!

Any updates on this?