Web crawler body_content should not include the contents of the title tag

When using the web crawler feature, resulting documents have a body_content that starts with the same string as the <title>. I have no idea why.

The documentation at Web crawler reference | Elastic App Search Documentation [8.3] | Elastic specifically says that body_content comes from the <body> tag. Having the title appended screws up the relevancy of snippets when the search term appears in the title.

Hi @Antonio_Gutierrez ! Could you share the URL or page source, so we can take a closer look?

Thanks!

Open support case: #00994044
Example:

Crawled doc:

The title should not be in the body. The h2 should be, and is as expected.

This has been confirmed as a bug, thanks for raising it!

We'll keep you updated on the fix version. Stay tuned!