Any analyzer / mapping suggestions for Web search?


(Will Ezell) #1

We are planning to develop a site search type functionality to replace some
aging and painful nutch infrastructure. We are obviously not looking to
rebuild google, but assuming we have a decent spider, does anyone
have analyzer/mapping suggestions for a site search index?

Off the top of my head, the mapping would include fields like:

host
site
content length
url
mime/type
title
description (from html or tika)
content
last modified
keywords
language
author
geo_stuff

I was wondering if anyone had any experiences or analyzer / mapping
definitions they'd be willing to share?

Will


Will Ezell


(egaumer) #2

We use nutch to crawl and feed elasticsearch so you might be able to
continue to use nutch as your crawler.

-Eric

On Friday, March 9, 2012 9:29:23 AM UTC-5, Will Ezell wrote:

We are planning to develop a site search type functionality to replace
some aging and painful nutch infrastructure. We are obviously not looking
to rebuild google, but assuming we have a decent spider, does anyone
have analyzer/mapping suggestions for a site search index?

Off the top of my head, the mapping would include fields like:

host
site
content length
url
mime/type
title
description (from html or tika)
content
last modified
keywords
language
author
geo_stuff

I was wondering if anyone had any experiences or analyzer / mapping
definitions they'd be willing to share?

Will


Will Ezell
http://dotcms.com


(system) #3