When using the web crawler feature, resulting documents have a body_content that starts with the same string as the <title>
. I have no idea why.
The documentation at Web crawler reference | Elastic App Search Documentation [8.3] | Elastic specifically says that body_content comes from the <body>
tag. Having the title appended screws up the relevancy of snippets when the search term appears in the title.
Carlos_D
(Carlos)
July 20, 2022, 12:38pm
2
Hi @Antonio_Gutierrez ! Could you share the URL or page source, so we can take a closer look?
Thanks!
Open support case: #00994044
Example:
Crawled doc:
The title should not be in the body. The h2 should be, and is as expected.
Carlos_D
(Carlos)
July 26, 2022, 9:47am
4
This has been confirmed as a bug, thanks for raising it!
We'll keep you updated on the fix version. Stay tuned!
system
(system)
Closed
August 23, 2022, 9:48am
5
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.