I'm migrating from a v7.17.10 cluster to v8.8.2 using the reindex API. I'm seeing that some indices with data from ingested files (docx, pdf mostly) only take up 5% of the space that they used to. I've done a little spot checking and it seems that the data is indeed searchable, so I'm wondering why this may be. The normal data seems to take up about 80-90% of the space that it used to.
Is this expected? I'm struggling to understand how this is possible, because I don't even think that compression on the files themselves would yield such a savings. I would think that if the extra space were from my own dirty data then it would just translate straight over after the reindex.
If this is expected, about what final size should I plan for based on the v7 size?