Hello,
I am trying to index a set of PDF documents using the web crawler. The deployment is in GCP cloud and the PDF documents are specified in the sitemap, which is documented in the robots.txt file. I am not using workspace solution. Do I need to define an attachment processor in the ingestion pipeline? Thanks
The log explorer is showing the message: Unexpected content type application/pdf for a crawl task with type=content
for each pdf document in the sitemap.
I believe the issue should be resolved as per the documentation specified here:
I will update the web crawler configuration and provide an update.