Hello everyone, I want to ask for help for a recommendation system that I'm trying to put together.
I have data for millions of users that buy in thousands of retail shops (Food, clothes, services). What I need is, given a shop X, recommend N users that might be interested in buying in shop X.
The data I currently have in Elasticsearch looks like this:
PUT recommendations/shop/1
{ "frequent_users": ["user1", "user2", "user3", "user4", "user5", "user6"] }
PUT recommendations/shop/2
{ "frequent_users": ["user1", "user2", "user3",] }
PUT recommendations/shop/3
{ "frequent_users": ["user4", "user5", "user6"] }
PUT recommendations/shop/4
{ "frequent_users": ["user1", "user6"] }
Keep in mind that I have access to every transaction that was made by each user in all the shops, its just that I grouped it like this in order to index it in ES, but I can change how to information is Indexed if needed.
The part that I'm lost is where I try to query the information using the significant_terms function, as mentioned above, the query I need is, given a shop X, give me a list of users that might want to shop there, this is the query I have so far:
POST recommendations/shop/_search
{
    "query": {
        "match": {
            "frequent_users": "user1"
        }
    },
    "aggregations": {
        "clients": {
            "significant_terms": {
                "field": "frequent_users.keyword",
                "min_doc_count": 1
            }
        }
    }
}
What this query does is given a single user1, retrieves similar users according to where other users bought. This is the response of the query above:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "frequent_users": [
            "user1",
            "user2",
            "user3",
            "user4",
            "user5",
            "user6"
          ]
        }
      },
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "4",
        "_score": 0.19856805,
        "_source": {
          "frequent_users": [
            "user1",
            "user6"
          ]
        }
      },
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "2",
        "_score": 0.16853254,
        "_source": {
          "frequent_users": [
            "user1",
            "user2",
            "user3"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "clients": {
      "doc_count": 3,
      "bg_count": 4,
      "buckets": [
        {
          "key": "user1",
          "doc_count": 3,
          "score": 0.3333333333333333,
          "bg_count": 3
        },
        {
          "key": "user2",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user6",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user3",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user4",
          "doc_count": 1,
          "score": 0.11111111111111108,
          "bg_count": 1
        },
        {
          "key": "user5",
          "doc_count": 1,
          "score": 0.11111111111111108,
          "bg_count": 1
        }
      ]
    }
  }
}
Thanks in advance! Any help would be greatly appreciated.