Please format your code, logs or configuration files using </>
icon as explained in this guide and not the citation button. It will make your post more readable.
Or use markdown style like:
```
CODE
```
There's a live preview panel for exactly this reasons.
Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.
About your question, well, this is the way this plugin works. It is meant to index whatever binary files you have: doc, pdf, xml, txt... It does text extraction and metadata extraction.
If your documents are all XML files then you can use something like Logstash to parse them and generate JSON documents.
FSCrawler might help as well. See https://github.com/dadoonet/fscrawler#indexing-xml-docs