Recommendation System for retail

Daniel_Zuluaga · June 29, 2018, 2:45pm

Hello everyone, I want to ask for help for a recommendation system that I'm trying to put together.

I have data for millions of users that buy in thousands of retail shops (Food, clothes, services). What I need is, given a shop X, recommend N users that might be interested in buying in shop X.

The data I currently have in Elasticsearch looks like this:

PUT recommendations/shop/1
{ "frequent_users": ["user1", "user2", "user3", "user4", "user5", "user6"] }
PUT recommendations/shop/2
{ "frequent_users": ["user1", "user2", "user3",] }
PUT recommendations/shop/3
{ "frequent_users": ["user4", "user5", "user6"] }
PUT recommendations/shop/4
{ "frequent_users": ["user1", "user6"] }

Keep in mind that I have access to every transaction that was made by each user in all the shops, its just that I grouped it like this in order to index it in ES, but I can change how to information is Indexed if needed.

The part that I'm lost is where I try to query the information using the significant_terms function, as mentioned above, the query I need is, given a shop X, give me a list of users that might want to shop there, this is the query I have so far:

POST recommendations/shop/_search
{
    "query": {
        "match": {
            "frequent_users": "user1"
        }
    },
    "aggregations": {
        "clients": {
            "significant_terms": {
                "field": "frequent_users.keyword",
                "min_doc_count": 1
            }
        }
    }
}

What this query does is given a single user1, retrieves similar users according to where other users bought. This is the response of the query above:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "frequent_users": [
            "user1",
            "user2",
            "user3",
            "user4",
            "user5",
            "user6"
          ]
        }
      },
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "4",
        "_score": 0.19856805,
        "_source": {
          "frequent_users": [
            "user1",
            "user6"
          ]
        }
      },
      {
        "_index": "recommendations",
        "_type": "shop",
        "_id": "2",
        "_score": 0.16853254,
        "_source": {
          "frequent_users": [
            "user1",
            "user2",
            "user3"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "clients": {
      "doc_count": 3,
      "bg_count": 4,
      "buckets": [
        {
          "key": "user1",
          "doc_count": 3,
          "score": 0.3333333333333333,
          "bg_count": 3
        },
        {
          "key": "user2",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user6",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user3",
          "doc_count": 2,
          "score": 0.22222222222222215,
          "bg_count": 2
        },
        {
          "key": "user4",
          "doc_count": 1,
          "score": 0.11111111111111108,
          "bg_count": 1
        },
        {
          "key": "user5",
          "doc_count": 1,
          "score": 0.11111111111111108,
          "bg_count": 1
        }
      ]
    }
  }
}

Thanks in advance! Any help would be greatly appreciated.

Mark_Harwood · June 29, 2018, 4:52pm

Reading it back, the task you have is to "find more people like the people who visit Store X".
The challenge is identifying a certain type of person.
Judging by the data you have presented you know nothing about these people in terms of age, gender, location, friendships etc - the only thing you can possibly use to describe "that type" of person is looking at the list of other stores they visit in the hope that somehow "defines" them. For that you would need a person-centric customer index eg.

{ "user": 1, "visited_stores": ["x", "y", "z"] }

You'd then query for the existing store X customers and look for significant terms on the "visited_stores" field to see if there was anything "uncommonly common" about them. You would then use these suggested stores as a query - minus those people who are already Store X visitors eg

{
    "bool":{
        "should":[
             { "term" :{ "visited_stores":"store_significantly_like_x"}},
             { "term" :{ "visited_stores":"another_store_significantly_like_x"}},
             ...
       ],
       "must_not" : [
             { "term" :{ "visited_stores":"x"}}
      ]
    }

Note you should use the verbose example of multiple term queries in a should clause rather than a single terms query because elasticsearch assumes you don't want relevance scoring on terms queries and we absolutely do want IDF scoring on rare things like Joe's Skate Shack rather than common terms like Walmart.

Daniel_Zuluaga · June 29, 2018, 6:17pm

Hello Mark, thank you so much for your help!

I do have more information about each costumer, this is a sample of the table with millions on transactions that I have:

consumer_id	trx_date	trx_time	value	commerce_id	tj	name	gender	age	occupation	email	income	lat	lon	city	mcc
1	20180528	11:44:50	181400	10803914	1	name	F	63	NULL	@yahoo.com	7	11,00892419	-74,83473888	BARRANQUILLA	763
2	20180516	19:17:38	131,58	8060000000	1	name	M	44	Empleado	@hotmail.com	12	4,650019173	-74,12242269	BOGOTA	NULL
3	20180516	15:23:46	1181040	612250000	1	name	M	39	Empleado	@gmail.com	8	4,744751454	-74,086149	BOGOTA	NULL
4	20180524	14:06:57	116150	12321071	1	name	F	39	Independiente	@hotmail.com	4	11,00649976	-74,83454968	BARRANQUILLA	9399

The important fields being, gender, age, occupation, income, lat, long and city. The shop id would be the field commerce_id. I could also put the shop name on this data, but I'm using the commerce_id as a unique shop identifier.

Mark_Harwood · June 30, 2018, 12:03pm

Significant terms is designed to find individual terms that are correlated with your query.
However, the type of people who visit store X might be best identified by multiple terms in combination e.g. the store is found to be popular with women aged between 20 and 30. That particular combo of information is not currently a single term in the index so can't be discovered using the significant_terms aggregation (unless you index using special single-token strings eg Male+Teenager+London). You may find doing some analysis in R or similar would be a better way to discover the combinations of attributes that define Store X customers and then use elasticsearch to query for customers with those attributes.

Daniel_Zuluaga · July 3, 2018, 4:16pm

What if I index using your suggestion of a person-centric customer index, and add the demographic information like so:

{ "user": 1, "gender": "Male", "city": "aaa", "age_group": "Middleage", "occupation": "Zzz", "visited_stores": [1, 2, 3, 4, 5] }
{ "user": 2, "gender": "Female", "city": "aaa", "age_group": "Teenager", "occupation": "Xxx", "visited_stores": [1, 2, 3, 4, 5] }
{ "user": 3, "gender": "Male", "city": "bbb", "age_group": "Middleage", "occupation": "Zzz", "visited_stores": [1, 2, 3, 4, 5] }
{ "user": 4, "gender": "Female", "city": "ccc", "age_group": "Teenager", "occupation": "Www", "visited_stores": [1, 2, 3, 4, 5] }

Would something like this work for asking the question as you stated?

"Find more people like the people who visit Store X"

Mark_Harwood · July 3, 2018, 4:22pm

Would something like this work

Yes, it would help but as I mentioned in my previous comment I suspect it would be useful to also try index combos of demographic info if you want to discover, for example, that shop X is predominantly for middle-aged men. You'd need a field which concatenated age_group and gender into a single term.

Daniel_Zuluaga · July 3, 2018, 4:37pm

So it would be something similar with a new field, like this:

{ "user": 1, "gender": "Male", "city": "aaa", "age_group": "Middleage", "occupation": "Zzz", "visited_stores": [1, 2, 3, 4, 5] , "combo": "Male+aaa+Middleage+Zzz" }
{ "user": 2, "gender": "Female", "city": "aaa", "age_group": "Teenager", "occupation": "Xxx", "visited_stores": [1, 2, 3, 4, 5] , "combo": "Female+aaa+Teenager+Xxx" }
{ "user": 3, "gender": "Male", "city": "bbb", "age_group": "Middleage", "occupation": "Zzz", "visited_stores": [1, 2, 3, 4, 5] , "combo": "Male+bbb+Middleage+Zzz" }
{ "user": 4, "gender": "Female", "city": "ccc", "age_group": "Teenager", "occupation": "Www", "visited_stores": [1, 2, 3, 4, 5] , "combo": "Female+ccc+Teenager+Www" }

And then could you help me with the way the query would be after indexing the information like this?

Mark_Harwood · July 3, 2018, 5:21pm

You'd need to map combo as a keyword field type then run a query something like this:

POST users/user/_search
{
	"query": {
		"match": {
			"visited_stores": "1"
		}
	},
    "size":0,
	"aggregations": {
		"clients": {
			"significant_terms": {
				"field": "combo"
			}
		}
	}
}

That would give you the stereotypes for store 1 visitors.
Then you do the query to find the people who haven't visited store 1 but fit the store 1 stereotype:

POST users/user/_search
{
	"query": {
		"bool":{
				"should":[
					 { "term" :{ "combo":"Male+aaa+Middleage+Zzz"}},
					 ... other significant stereotypes
			   ],
			   "must_not" : [
					 { "term" :{ "visited_stores":"1"}}
			  ]
			}
	}
}

Daniel_Zuluaga · July 18, 2018, 7:52pm

Hello Mark, I just wanted to thank you for all your help!

I have a solution working at the moment and planing to deploy into production soon.

Mark_Harwood · July 19, 2018, 8:16am

Good to hear! Hope all goes well.

It might be worth considering using the sampler aggregation (or the diversified_sampler) in conjunction with significant terms. It can improve both the search response times and the quality of results - see https://www.youtube.com/watch?v=azP15yvbOBA

Another tip to make sure you have an effective system is benchmarking on existing data to determine how effective your store visitor profiles are. To do this change a percentage of the records of users who have visited store X by removing it from the list of visited stores but add store X to a new field called "heldBackStores". You can then benchmark your recommendation system by seeing how many recommendations are offered to users who match what you have derived as the "Store X profile of customer" and seeing how many of the genuine store X users (those with an entry in heldBackStores) you manage to match.

system · August 16, 2018, 8:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant Terms Recommendations Elasticsearch	7	713	July 27, 2018
Collaborative Filtering Elasticsearch	1	1259	April 26, 2018
Recommendations in ES Elasticsearch	2	145	May 29, 2024
ElasticSearch aggregation for recommendation engine Elasticsearch	2	340	December 24, 2020
"Often buyed together" using aggregations? Elasticsearch	5	2029	December 30, 2017

Recommendation System for retail

Related topics