Compare relevance for different document types


(Barry Woods) #1

We have a system with many different entity types (documents, meetings, contacts, etc) and are trying to implement a global search page that searches across them.

Now I understand that TF/IDF cannot work in this situation but we are perfectly happy to live without it.

The approach we were hoping to implement was to assign each searchable field on each entity to one of four scoring levels. For example, the title of a document, the title of a meeting and the last name of a contact would all be rank 1 properties and so a perfect match on any one would end up with the same score.

We tried to do it using constant scores but queryNorm is messing up the numbers. From what I have read, there isn't anything we can do to set the queryNorm or even find out what it is at query time to compensate for it.

We also looked into function scores but we struggled to implement that in a way that would allow us to do the cleverer stuff like work out the score if the hit is partly in a rank 1 field and partly in a rank 2 one or apply an extra boost to the result of the function based on how old the item is or how many views it has had.

Has anyone else tried to do this sort of thing and come up with a solution or is there an easier way to handle this that we are missing out on?

Thanks


(system) #2