How can i disable content extraction?

I am using App search based engines. By default my crawler extracts the content from web pages and pdf's. But when i am running the crawl for one particular app search engine, i only want the meta data of both the web pages and pdf's to be extracted but not the content from it. how can i achieve it? any help would be appreciated. Thanks.

Hi @maddy30 ,

Looks like this might be related to your other question here: How can i update the pipeline used for a app search engine?

The configurations to extract content from files (like PDFs) are made at a deployment level, not on an engine-by-engine basis. What you could do is add conditionals to your ingest pipeline to run certain processors only if the URL matches a certain domain or pattern.

Alternatively, you can take the approach I suggest in the other post to use different pipelines per index, and have some pipelines remove the body_content from your documents before indexing it.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.