Appsearch support for large attachments

We are trying to figure the best solution for handling attachments with Appsearch. I saw a thread regarding a hard 10MB limit for file. To overcome Appsearch limitation is to build a custom intermediary solution to parse files and handle either with Tika or use elastic-search attachment ingest and then parse file into JSON and send to Appsearch.

Wanted to know what approach people implement and any issues/constraints we should consider.

Reference:

I did not see the datatype called "attachments" in the Appsearch documentation

I found a suggestion to use _simulate (which means to post the attachment to elasticsearch first then query the elasticsearch _simulate to get back the data to be submitted to the Appsearch

or

Are we missing something?

Another solution is to use Tika: * programmatically parsing my files with Apache Tika (should I convert them to Base64?)

Thank you in advance

There is a feature request about connecting FSCrawler with AppSearch but it's not yet there.

With appsearch limits, is there anything to bypass results per query.

  • Is there a way to bypass the 10,000 paginated results limit?
  • Is there a way to get all IDs for given Elasticsearch request?
  • Why is search limited to 10,000 records if Result Pages limit is 100 pages and Results per page limit is 1000?

Not sure this applies to AppSearch but with Elasticsearch you can use:

  • the size and from parameters to display by default up to 10000 records to your users. If you want to change this limit, you can change index.max_result_window setting but be aware of the consequences (ie memory).
  • the search after feature to do deep pagination.
  • the Scroll API if you want to extract a resultset to be consumed by another tool later. (Not recommended anymore)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.