Quick question for the community: has anyone here tried indexing pages that are heavy on visuals?
I’ve seen cases where full-text search works fine for pure articles, but when the page has lots of images + short descriptions (example: fashionismic-edgar-haircut), the results feel less accurate.
Do you usually stick to body text only, or include extra metadata like captions/alt tags in your indices?
Good question when dealing with image-heavy pages, it helps to enrich documents with metadata like captions, alt tags, and surrounding text. Index those fields with analyzers and use boost or function_score to balance their weight. Body text stays important, but metadata gives extra signals to improve relevance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.