Hi there,
We have a scenario where the documents we index might be present at multiple folders with same content. at indexing time, we can easily detect these duplicate as we have content hash. the duplication factors sometimes we see is >20. We don't store _source and we store only a few fields.
In normal cases (no duplicates) the size of index is reasonable. When we move to nested OR parent child and start storing _source the size grows >5 times and the query performance also slows down.
Say, for time, we don't care much about # of documents in index / size of index, doing no duplication and storing redundant data in ES is fine when compare to taking one of nested/ parent-child approach?
P.S. Our overall document size is way more than the stored fields size.
Regards,
Imran