Hi there,
I'm testing the App Search Webcrawler.
Is there a way to extract more metadata than the current standard ones?
The documentation mentions something about adding a template (Web crawler reference | Elastic App Search Documentation [8.4] | Elastic) but I can't find a way to implement this.
How can I enrich my documents with extra data without changing all my webpages?
Could you link us to an example page you're crawling, and/or provide a snippet of the tags content from your crawled pages that you're using to attempt custom document attributes? The instructions you link to are indeed the way to accomplish custom document attributes.
Could you also confirm the version of Enterprise Search you're running?
Thanks for providing the example. The documentation you originally link to is the only way currently supported. You will need to modify the crawled page(s) to include <meta ... > tags that the crawler will recognize and pick up as custom fields.
The good news is that we plan to introduce configurability to the crawler in the future that would not require introducing <meta> tags to your crawled content. However, I can't provide a date by which that would be available.
Great, I got it now.
It seems that I just had to add an extra field to the schema with the name of the meta tag.
The crawler then picked it up automatically.
This wasn't entirely clear to me from the documentation, but it's clear to me now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.