Workplace Search - Text extraction from big pdf files

I added Dropbox as a content source to my Enterprise-search cloud instance. All files got indexed as expected and I can search across multiple sources. But I noticed that for big pdf files (30+ pages), only the first ~10 pages are searchable. So that if I search for text that is included at the end of file, no hits are returned.

Is this the expected behavior? Is there a setting I need to change to fix this?

Hi sam1325 :wave:

By default Workplace Search only indexes the first 100kb of files. This can be configured by changing this

workplace_search.custom_api_source.document_size.limit: 100kb

in your enterprise-search.yml file.

I guess the naming is a bit wrong, since this doesn't seem to apply to only the Custom API Source, but all sources. I will create an internal ticket to investigate further.

Thank you for the answer. I changed the Setting in the yml file and it worked as expected!