I'm developing a program that uses elastic search as a search engine.
I found it interesting to have the opinion of the community on how to structure my data.
The project is quite simple, we have documents (a document can be in several languages and have several versions) that are categorized and have dynamic meta data according to their category.
I have already done some test with "elasticsearch ingest attachment" to send the document and parse directly?
But I do not see how deal with meta data that is dynamic according to the category of the document.
I prefer the later form but I'd use 2 indices: my_index_fr and my_index_en. Unless there is a need to have an absolute relationship between both versions of the same document.
Well. It reduces the number of fields within one index. Not a big deal here I guess as you have a few of them.
What I'd think about is "reindex" needs. If something goes wrong with a specific lang, ie FR and you need to change the text analyzer (which means reindex). Do you want to reindex both languages or only one?
Dummy question about "PUT _ingest/pipeline/attachment", i need to call once or for each "PUT
my_index/_doc/my_id?pipeline=attachment" in different request ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.