Upon first ingestion, a default text based schema was created for me.
I need to create a custom schema for the web crawler to index.
The default index did a good job but it did not grab the product image, which is pretty critical for me to launch.
I want to grab the product image from here, but the href for the images does not contain the website domain it is a CDN link. How can I change the configuration rules to pick grab this link?
The best option for retrieving a specific field is using meta tags and content extraction. They both require to modify the page HTML in order for the crawler to find the specific field.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.