Heya! App Search currently does not have anything out of the box to ingest files for you - you'd have to set up your own intermediary system of content extraction and send that data/JSON to App Search.
Alternatively, Workplace Search is capable of ingesting files and might be worth checking out as an option. Also CCing @nickchow on this one, feel free to chime in if you have any other alternative solutions for Sudharsanam!
Hi @nickchow@constancecchen,
If we index one million documents having 10mb of content string after extraction in a single search engine in AppSearch
will it be able to support that without crashing,
And for workplace search please share any documentation on how we can ingest files.
@Subhasis_Dash Sounds like you have experience doing something like this with Elasticsearch. How was the performance for the scenario with 1M documents having 10MB of content?
The ability to fine-tune the mapping and settings would probably let you achieve a decent search experience with Elasticsearch, while the challenge with App Search becomes that you don't get much control over the mapping.
Sorry, but I have no experience in using elastic search,
The primary objective is to index the file's text contents in AppSearch. In this case, every file is a document in AppSearch.
We got to do this for millions of files. This not necessarily means that all documents will be of size 10mb.
Is Appsearch a suitable tool for achieving the above?
@Subhasis_Dash - In theory App Search is capable of this, if you're self-hosting on Elastic Cloud or other service, all that's required is for you to scale up your server specs/size/nodes etc. until it can support the level you're looking for. I personally don't have experience using App Search at the million-document scale, although I know we have 1 or 2 customers who have this number of documents.
@Sudharsanam - Unfortunately I believe the 10MB API payload limit is currently a hard-coded cap and is not a configurable limit. My suggestion for now would be breaking up your larger 10MB+ documents up into whatever equivalent of chapters you have, if possible.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.