Aggregate results before scoring


(Devin) #1

I'm storing an event calendar where events are often repeated at different times and locations. For simplicity, let's say I'm storing these fields:

  • title
  • description
  • when
  • where
  • uid

(I defined uid to be a identifier for unique events based on title and description, which I'm using in my app to group the events.)

Having a separate document for each instance of a repeating event makes it easy to do search and filter, but it bothers me that the tf-idf scoring is not quite correct since there are so many duplicates.

I've read about nested datatypes and also results aggregation, but I'm not sure which if either are right for me. What's the right way to design an index so that scoring functions are only considering "unique" events?


(Alexander Reelsen) #2

This is a tough question, as I have no idea what is the correct definition of relevant when you are searching for calendar entries. You are right that tf-idf scoring changes if you have a recurring event - but is this really a problem? If so, why?

I've been thinking that your search usually has a strong scoring influence based on the time (you are very interested in events around now and even more so about upcoming events compared to previous ones.

You could play around with nested or the join datatype (indexing a unique event as parent and the follow up events as childs), but I am not sure if that gives you the perfect scoring you would like to have.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.