I'm interested in building a software system which will connect to various document sources, extract the content from the documents contained within each source, and make the extracted content available to a search engine such as Elastic. This search engine will serve as the back-end for a web-based search application.
I'm interested in rendering snippets of these documents in the search results for well-known types, such as Microsoft Word and PDF. How would one go about implementing document snippet rendering in search?
I'd be happy with serving up these snippets in any format, including as images. I just want to be able to give my users some kind of formatted preview of their results for well-known types.