we are using the Elastic Web Crawler and we are trying to exclude content by using content extraction rules. based on CSS selectors.
To be clear, we do not want to select content from DOM by an rule to an field, we want to exclude and send the "rest" to a field.
An example for such an selector is:
That should exclude all nav and button tags below all div. It is not working proper. It seems that the crawler does not support multiple selectors separated by a comma, e.g. (nav,button).
If we use a single selector it works, e.g.
main div:not(:has(button)). But we can't define multiple rules because the rules are not working as a pipeline and use the output of a rule before.
As mentioned here Web crawler content extraction rules | Enterprise Search documentation [8.10] | Elastic Elastic extraction rules are supporting CSS Level 3 as described here Selectors Level 3 Selectors Level 3 (w3.org)
And CSS Level 3 is supporting group of selectors who act at the end as a "single selector".
On MDN :not() - CSS: Cascading Style Sheets | MDN (mozilla.org) it is also described that comma separated selectors can be used.
- You can negate several selectors at the same time. Example:
:not(.foo, .bar)is equivalent to
Therefore also :not(selector) or :has(selector) should support examples like :has(nav,button) or :not(nav,button).
Could you have a look into this?