For attachment data, is there a practical size limit to the content of the attachment?
I wish to add multiple attachments to an index in a nested sub-object and there may be a lot of data.
Is there a point where it is more practical to have an index (or multiple indices) of just attachment data?
Even for the extracted text the data could be large. I have pumped some 40Mb pdfs through the attachment ingest pipeline and they end up with 6-8 mb of text.
I'm just looking for any indication from experience of what point this is no longer a good strategy, either search performance degrades, the indexed Json objects become unwieldy or anything else ...
Sorry for the late reply.
In general, it's not efficient to store very big json documents as anytime you search for them, you will read by default the source of 10 documents from disk.
This consume time, network bandwidth...
Also, when Lucene segments need to be merged, you will end up with lot of IOS on disk.
Same when a node leaves, lot of things have to be copied over the network.
Keeping json documents as small as possible is better IMO.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.