How to extract metadata using the Webcrawler

Hi there,
I'm testing the App Search Webcrawler.
Is there a way to extract more metadata than the current standard ones?
The documentation mentions something about adding a template (Web crawler reference | Elastic App Search Documentation [8.4] | Elastic) but I can't find a way to implement this.
How can I enrich my documents with extra data without changing all my webpages?

Best Regards,

Marten

Hey @Marten,

Could you link us to an example page you're crawling, and/or provide a snippet of the tags content from your crawled pages that you're using to attempt custom document attributes? The instructions you link to are indeed the way to accomplish custom document attributes.

Could you also confirm the version of Enterprise Search you're running?

Thanks
Ross

Hi Ross,

Thanks for your quick reply.
The page I want to crawl is:

I'm using the hosted app search service on Elastic Cloud since yesterday, so I guess that it's the latest version.

Best Regards,

Marten

Thanks for providing the example. The documentation you originally link to is the only way currently supported. You will need to modify the crawled page(s) to include <meta ... > tags that the crawler will recognize and pick up as custom fields.

The good news is that we plan to introduce configurability to the crawler in the future that would not require introducing <meta> tags to your crawled content. However, I can't provide a date by which that would be available.

Great, I got it now.
It seems that I just had to add an extra field to the schema with the name of the meta tag.
The crawler then picked it up automatically.
This wasn't entirely clear to me from the documentation, but it's clear to me now.

Thanks for your help,

Marten

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.