Elastic Percolator- Avoid returning the same result x2 to same user


(Fresh83) #1

So ive created a Elastic Search percolator . How it works:

•Users sign up
•User is required to add their college classes
•User is asked to create a hashtag based bio of their interests Ex: #Chipotle #Boxing #MrRobot #FightClub #Drawing
•User submits data - Our API analyzes the hashtag based bio , creates a array of all the hashtags in the bio and then creates a percolator that looks like this currently:

PUT /discovery/.percolator/3
{
  "query": {
    "bool": {
    "must": [
        { "match": { "college":{
            "query" : "Oakland University",
            "type" : "phrase"
        }}}
    ],
      "should": [
        { "match": { "college_class": "ART_400"}},
        { "match": { "college_class": "BIO_200"}},
        { "match": { "hashtag_bio": "#Boxing"}},
        { "match": { "hashtag_bio": "#MrRobot"}},
        { "match": { "hashtag_bio": "#FightClub"}},
        { "match": { "hashtag_bio": "#Running"}}
        
      ],
      "minimum_should_match": 1 
    }
  }
}

From there the user is taken into the app where we try to curate and suggest the best possible friend matches for them based on their hashtags and classes aka a query like this is made:

    POST /discovery/user/_percolate/
    { 
      "doc":{ 
    
        "hashtag_bio":"Hi my name is Chuck and I like #Chipotle  #MrRobot  and #fightClub",
        "college":"Oakland University"
       
      }
    }




**

My question is: Is there a way to add the userID's of all the users that have already swiped on each of the users in my percolator to avoid showing repeats/enable chunked results?

EXAMPLE:  

    PUT /discovery/.percolator/3
    {
      "query": {
        "bool": {
        "must": [
            { "match": { "college":{
                "query" : "Oakland University",
                "type" : "phrase"
            }}}
        ],
          "should": [
            { "match": { "college_class": "ART_400"}},
            { "match": { "college_class": "BIO_200"}},
            { "match": { "hashtag_bio": "#Boxing"}},
            { "match": { "hashtag_bio": "#MrRobot"}},
            { "match": { "hashtag_bio": "#FightClub"}},
            { "match": { "hashtag_bio": "#Running"}}
            
          ],
          "minimum_should_match": 1,
          "must_not": [
            { "match": { "userID": "2"}},
            { "match": { "userID": "3"}},
            { "match": { userID": "4"}},
            { "match": { "userID": "5"}},
            { "match": { "userID": "6"}}
            
          ],
        }
      }
    }

Is using MUST_NOT like this a efficient use if there were literally 100's or 1000's of userID's ? Is there a better way ?


(system) #2