Each document json information has metadata (status, type, name, description, size ) and the article content itself.
Metadata is of smaller size around 20KB. And article content is of larger size, it may range anywhere from 100KB to 100Mb. Content will not be updated however metadata information is updated (max 5-6 times in its lifetime)
We need to to search the content and metadata together and since elastic search mentions keeping - A single document should contain all of the information that is required to decide whether it matches a search request.
So I want to know how to document this piece.
But there are several approaches -
- Should we keep the metadata and content together, considering the extract from how updates work in elastic search
Consequently, updating a previously indexed document is a delete followed by a re-insertion of the document. Note that this means that updating a document is even more expensive than adding it in the first place. Thus, storing things like rapidly changing counters in a Lucene index is usually not a good idea – there is no in-place update of values. - Also since the content size can range from kbs to mbs, should we keep the entire content or should we split it into smaller segments and keep.
Are there any more approaches generally followed for this requirement ?