Within Enterprise Search Engine, data ingestion/Storage

Hello there,

I'm new to Elastic Enterprise Search, I have a question:
If the source data contains pictures/images, would Elastic ingests the data including the pictures/images and store it to Search Engine shards or just the links to the pictures/images in blobs? If the blobs are stored/indexed on Shards, it would take a lot of space. Do we have any documents specifying on this?

Please advise

Thanks

Li

Hi @lcui_dxc ,

It depends on which feature you're using to ingest data with Enterprise Search.

If you are using Workplace Search Content Sources, see: Content extraction | Workplace Search Guide [8.3] | Elastic
Workplace Search makes every effort to process binary files. Files that are "text like" (office documents, PDFs, html, etc) both have their text extracted. Further, an attempt is made to generate thumbnail images for office documents AND image documents. Rather than store the full-sized image, we store only two small copies of the thumbnail, which saves significant space. This feature can be disabled if the space is still a concern, though it does remove the availability of thumbnails in the default search experience. Other than the thumbnail images, no binary content is persisted in Elasticsearch. We do index links to the original document (image or otherwise)

If you are using the App Search Web Crawler, see: Web crawler reference | Elastic App Search Documentation [8.3] | Elastic. In the App Search Web Crawler, we attempt to extract text from binary documents, similar to what is attempted in Workplace Search. However we do not generate thumbnails today, nor will we persist binary content for any files. We do index a link to the original document, image or otherwise.

If you're using any of our Ingestion APIs (Elasticsearch Index API, App Search documents API, Workplace Search Custom Source API, etc) then we'll index whatever data you send us. Most folks do not index binary documents, but instead process binaries on their end, before sending data to our APIs for ingestion.

I hope this helps!

Thanks alot, Sean.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.