So ive created a Elastic Search percolator . How it works:
•Users sign up
•User is required to add their college classes
•User is asked to create a hashtag based bio of their interests Ex: #Chipotle #Boxing #MrRobot #FightClub #Drawing
•User submits data - Our API analyzes the hashtag based bio , creates a array of all the hashtags in the bio and then creates a percolator that looks like this currently:
PUT /discovery/.percolator/3
{
"query": {
"bool": {
"must": [
{ "match": { "college":{
"query" : "Oakland University",
"type" : "phrase"
}}}
],
"should": [
{ "match": { "college_class": "ART_400"}},
{ "match": { "college_class": "BIO_200"}},
{ "match": { "hashtag_bio": "#Boxing"}},
{ "match": { "hashtag_bio": "#MrRobot"}},
{ "match": { "hashtag_bio": "#FightClub"}},
{ "match": { "hashtag_bio": "#Running"}}
],
"minimum_should_match": 1
}
}
}
From there the user is taken into the app where we try to curate and suggest the best possible friend matches for them based on their hashtags and classes aka a query like this is made:
POST /discovery/user/_percolate/
{
"doc":{
"hashtag_bio":"Hi my name is Chuck and I like #Chipotle #MrRobot and #fightClub",
"college":"Oakland University"
}
}
**
My question is: Is there a way to add the userID's of all the users that have already swiped on each of the users in my percolator to avoid showing repeats/enable chunked results?
EXAMPLE:
PUT /discovery/.percolator/3
{
"query": {
"bool": {
"must": [
{ "match": { "college":{
"query" : "Oakland University",
"type" : "phrase"
}}}
],
"should": [
{ "match": { "college_class": "ART_400"}},
{ "match": { "college_class": "BIO_200"}},
{ "match": { "hashtag_bio": "#Boxing"}},
{ "match": { "hashtag_bio": "#MrRobot"}},
{ "match": { "hashtag_bio": "#FightClub"}},
{ "match": { "hashtag_bio": "#Running"}}
],
"minimum_should_match": 1,
"must_not": [
{ "match": { "userID": "2"}},
{ "match": { "userID": "3"}},
{ "match": { userID": "4"}},
{ "match": { "userID": "5"}},
{ "match": { "userID": "6"}}
],
}
}
}
Is using MUST_NOT like this a efficient use if there were literally 100's or 1000's of userID's ? Is there a better way ?