Would mit be a strategy for matching 2 data sets


(ssriram) #1

I'm looking to deploy a strategy for the following search problem
using elastic search.

A simplified version of the problem is as follows:

Say I have two data sets
- MyLikes
- contains multiple users and multiple records for each user
with 2 fields i.e. fields username & liketext
- ForSale
- contains multiple product records
with 2 fields i.e. product_title & description

I want to match MyLikes against ForSale and generate a third dataset
called Sales_I_Might_Like

The strategies I have in mind are

  1. Turn each ForSale entry into a tagsoup by removing stopwords and
    use these tags to go against MyLikes and build a table of users
    for each Product ranked by number of hits
  2. Do a more_like_this search for each ForSale Record against the
    MyLikes dataset and use those results.

Are there any other strategies that I could use and would the
more_like_this
strategy be effective.

Thanks
S. Sriram


(Shay Banon) #2

Yes, that would work.

On Wed, Jan 5, 2011 at 12:12 AM, ssriram ssriram@gmail.com wrote:

I'm looking to deploy a strategy for the following search problem
using elastic search.

A simplified version of the problem is as follows:

Say I have two data sets

  • MyLikes
    • contains multiple users and multiple records for each user
      with 2 fields i.e. fields username & liketext
  • ForSale
    • contains multiple product records
      with 2 fields i.e. product_title & description

I want to match MyLikes against ForSale and generate a third dataset
called Sales_I_Might_Like

The strategies I have in mind are

  1. Turn each ForSale entry into a tagsoup by removing stopwords and
    use these tags to go against MyLikes and build a table of users
    for each Product ranked by number of hits
  2. Do a more_like_this search for each ForSale Record against the
    MyLikes dataset and use those results.

Are there any other strategies that I could use and would the
more_like_this
strategy be effective.

Thanks
S. Sriram


(system) #3