Searching Multiple Indexes / Addressing Duplication and Boosting


(Brian Crosen) #1

EDIT: It turns out that I don't have to do this now. I can't find a way to delete the post so feel free to ignore.

I've been asked to determine the feasibility of indexing bibliographic records in Elasticsearch with the follow requirements:

  1. Each field in a record must have its own unique index. So we would have an index for titles, and index for authors, an index for subjects, etc. The reason for this is if we need to change the way a particular field is indexed, we would only need to recreate the index for that particular field.

  2. We must have the ability to construct a query that will search over multiple indexes, apply boosting to the results appropriately (as if boosting were applied to fields in a single index), and return record identifiers that are not duplicated. So if we have a match on the author index and a match on the title index that reference the same bibliographic record, we would have one record id in the result set.

We are using spring-data-elasticsearch.

I was able to construct a query that is made up of several Indices Queries that are put together with Bool Queries as described here: http://grokbase.com/t/gg/elasticsearch/125e7xswbd/separate-queries-on-each-index-to-be-searched
The problem I'm running into with that is the search results contain duplicate record IDs when multiple indexes return matches for the same record, and I suspect that the boosting is only being applied per-index rather than across all matching indexes.

Can anyone tell me if this can be done?


(system) #2