Efficient multi-tenant document popularity


(ppearcy) #1

Hey,
I have documents that are shared between different sites. Each wants
to measure the popularity of a document and be able to sort on this
value or have it impact the search relevance.

There is the obvious naive solution to have a field per site, per
document to measure this. This would get updated by some batch process
at some interval. I have some concerns on how well this will scale, as
we'd be nearly constantly re-indexing documents, in most cases these
are multi page reports.

It seems that the parent/child support might be able to fulfill this
case, but I am still trying to wrap my head around use cases for this
functionality. If I had the main parent document, and each child
document is a per site popularity score, I'd be able to update just
the children.

However, I'd need to be able to search against the parent docs and
sort on a child doc value. Is that possible?

Are there any alternatives I may be overlooking?

Thanks,
Paul


(Shay Banon) #2

Searching on parent docs (have the "main" query execute on them) and get
child docs to affect just the scoring based on the parents matches is not
possible. This type of requirement is actually not something that I think
will end up within the parent child feature set, as maintaing the "parent ->
child" points will be costly (but I still will try and play with it, see if
I am mistaken). Currently, with how it works, only child -> parent is
needed.

I have some other ideas to try and solve this and other similar requirements
in a different manner, but its a bit down the road (and quite a
big endeavor).

On Thu, Dec 30, 2010 at 7:15 PM, Paul ppearcy@gmail.com wrote:

Hey,
I have documents that are shared between different sites. Each wants
to measure the popularity of a document and be able to sort on this
value or have it impact the search relevance.

There is the obvious naive solution to have a field per site, per
document to measure this. This would get updated by some batch process
at some interval. I have some concerns on how well this will scale, as
we'd be nearly constantly re-indexing documents, in most cases these
are multi page reports.

It seems that the parent/child support might be able to fulfill this
case, but I am still trying to wrap my head around use cases for this
functionality. If I had the main parent document, and each child
document is a per site popularity score, I'd be able to update just
the children.

However, I'd need to be able to search against the parent docs and
sort on a child doc value. Is that possible?

Are there any alternatives I may be overlooking?

Thanks,
Paul


(system) #3