Hi all,
we are using content extraction rules with CSS selectors as described here: Web crawler content extraction rules | Enterprise Search documentation [8.11] | Elastic
We've found out that a page is NOT indexed if the element referenced in the rule is NOT existing in the page. That means, the crawler is not very fault tolerant.
For example we want do extract a meta tag to the string field displayurl which is referenced by the following CSS selector: html/head/link[@rel="canonical"]/@href
How can we extract information which is not available on each page?
Segards
Sebastian