Elastic crawler metadata content extraction

pngworkforce · October 21, 2024, 2:51pm

Hello!

Is there a way to extract html metadata fields with the elastic crawler without setting the class=“elastic” on them? We have inherited a large flat file html site we would like to index and it would take significant time to add this class to all files.

Is this something that can be done using the Web crawler content extraction rules?

Thanks

Imran

Sean_Story · October 21, 2024, 3:00pm

Yes, this can be done with the Elastic Web Crawler's Content Extraction Rules. See Web crawler content extraction rules | Enterprise Search documentation [8.15] | Elastic (note that this feature is not available for the App Search Web Crawler).

You can use either XPATH or CSS selectors to identify the element you'd like to extract to its own field.

pngworkforce · October 21, 2024, 3:03pm

Thanks Sean, I appreciate the fast response

system · November 18, 2024, 3:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to extract metadata using the Webcrawler Elastic Search elastic-app-search	5	718	August 13, 2021
Crawling web sites and indexing the extracted content Elasticsearch	8	10897	July 6, 2017
Elastic App Search Crawler Elastic Search elastic-app-search	3	165	February 26, 2024
AppSearch: Web Crawler - Indexing field with multiple values Elastic Search elastic-app-search	6	280	July 11, 2023
Can't get extraction rulesets working Elastic Search crawler	6	39	August 27, 2024

Elastic crawler metadata content extraction

Related topics