I am in the process of creating an index that can have a lot of nested documents. My document structure will look something like:
{
"personid": 1,
"comments":[
{},
{},
{},
...
]
}
Each person can have multiple comments, is each comment considered it's own nested document? What problems will I run into if I set up a person index where each document can have as many as 10,000 comments per person?
Having documents containing a lot of nested components can make updates quite expensive. Each nested component of a document is indexed as a separate document behind the scenes, and when anything is updated these are all updated/reindexed. If you have a relatively static dataset this maybe fine, but you may want to consider breaking up and flattening parts of your data model.
@Christian_Dahlqvist Thanks for the response. Would having a lot of nested documents in the index affect query performance?
My dataset is not completely static but it does not change a lot either. The personid field is also the document id. I was thinking of replacing the entire document whenever a comment gets added to a person document. Is this a very unreasonable/un-maintainable approach?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.