1st level documents are the most numerous (about 1000 a day)
Each 1st level document has up to 3 2nd level documents
Each 2nd level document has up to 4 3rd level documents
Each 3rd level document has only one 4th level document (maybe merge can be done to prevent 4th level)
The documents are immutable but as I said, sub-levels are attached to parents when data are available (from couple of hours to couple of days).
That is only 12000 (24000 if levels 3 and 4 are not merged) documents per day if I have calculated correctly. At that scale I suspect the best approach will depend on how you need to query the data.
Our queries results are the complete hierarchy of first level documents matching the specified search filters.
These search filters can be :
a full text search criteria on every property of every level document
and/or specific criteria (constant score term or date range) on some properties of first level documents, some other properties of second level documents and so on on each level
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.