Reading through the thread it seems to me like we should take a step back actually and first understand what exactly you want to do with these documents once they are indexed:
- What types of queries do you want to execute against the finished index?
- What types of updates will have to be applied to your documents, if any?
Isabel