It looks like the product itself is already capable of extracting text from attachments (i.e. PDF, PPTX, DOCX, etc.). Is there a way to let the product do this as well for custom content sources? Is there anything on the roadmap?
We use Apache Tika under the hood to extract text from eligible file formats. We've talked about allowing the user to send raw files to the Custom Source API, but haven't committed it to a roadmap as far as I'm aware. If you're a customer with a support relationship with us, I'd encourage you to submit that as an enhancement request if that's an important feature for your usecase. In the mean time, I'd encourage you to look at using Tika or something similar to help you extract text and metadata from your raw files.