How to group bucket similar bucket terms for faceted search (misspellings, alternate capitalization, etc)


I'm trying to implement a faceted search where, aggregation bucket terms are returned, and displayed to the user so the user can select one or more of the bucket terms to filter the results.

I'm currently indexing fields as multi-field, including a field using the default analyzer, and another (raw), using no analyzer. I perform the aggregations and filtering on the raw field.

My data is coming from a third party and terms include minor inconsistencies leading to buckets which otherwise should be together. Human names are a particular issue. I'm finding, last name first in some cases, and first name first in others. Sometimes there is a middle initial. Capitalization various, and occasionally there is a misspelling. Company names, similarly are an issue.

If possible, I'd like to group these buckets. Outside of changing the input data, is there a good way to do this? i.e. perhaps use a different analyzer?

(system) #2