I have a legacy system , which stores documents of a specific language in its own index, for example, English based index will store all the documents which have language English, the german-based index will store all the german documents, this was done maybe to improve the relevance, as storing all the documents in the same index, can cause relevance issue.
We have below functionality of our system:
- The document is created and indexed only in one language but our system search in all the language-based indices, as the search query is not language-specific, in a system-wide search query.
- User can write English content in the German category and it will be indexed without translating it to german, in the german index. but search query is specific to a language in some use cases.
- One category or a document language can be changed anytime, which currently involves deleting all the documents from the current index and creating these in new language index. (this creates a lot of bugs in the system, like duplicate records, some documents missed the update etc)
we wanted to re-design a system due to lot of bugs and performance issues when admin changes a language of a category having millions of documents, and thought of getting rid of different indices on language basis and just store the current language of the document, and update it to a new language if there is a change of language, but not sure of relevance issue in general and what is industry recommendations around this.