Any analyzer / mapping suggestions for Web search?

We are planning to develop a site search type functionality to replace some
aging and painful nutch infrastructure. We are obviously not looking to
rebuild google, but assuming we have a decent spider, does anyone
have analyzer/mapping suggestions for a site search index?

Off the top of my head, the mapping would include fields like:

host
site
content length
url
mime/type
title
description (from html or tika)
content
last modified
keywords
language
author
geo_stuff

I was wondering if anyone had any experiences or analyzer / mapping
definitions they'd be willing to share?

Will


Will Ezell

We use nutch to crawl and feed elasticsearch so you might be able to
continue to use nutch as your crawler.

-Eric

On Friday, March 9, 2012 9:29:23 AM UTC-5, Will Ezell wrote:

We are planning to develop a site search type functionality to replace
some aging and painful nutch infrastructure. We are obviously not looking
to rebuild google, but assuming we have a decent spider, does anyone
have analyzer/mapping suggestions for a site search index?

Off the top of my head, the mapping would include fields like:

host
site
content length
url
mime/type
title
description (from html or tika)
content
last modified
keywords
language
author
geo_stuff

I was wondering if anyone had any experiences or analyzer / mapping
definitions they'd be willing to share?

Will


Will Ezell
http://dotcms.com